CONCRETE
MATHEMATICS
Dedicated to Leonhard Euler (1707-l 783)
CONCRETE
MATHEMATICS
Dedicated to Leonhard Euler (1707-l 783)
CONCRETE
MATHEMATICS
Ronald L. Graham
AT&T Bell Laboratories
Donald E. Knuth
Stanford University
Oren Patashnik
Stanford University
A
ADDISON-WESLEY PUBLISHING COMPANY
Reading, Massachusetts
Menlo Park, California
New York
Don Mills, Ontario
Wokingham, England
Amsterdam
Bonn
Sydney Singapore Tokyo Madrid
San Juan
Library of Congress Cataloging-in-Publication Data
Graham, Ronald Lewis,
1935-
Concrete mathematics
: a foundation for computer science
/
Ron-
ald L. Graham, Donald
E.
Knuth,
Oren
Patashnik.
xiii,625 p.
24 cm.
Bibliography: p. 578
Includes index.
ISBN o-201-14236-8
1.
Mathematics--1961-
2. Electronic data processing--Mathematics.
I. Knuth, Donald Ervin,
1938-
.
II. Patashnik,
Oren,
1954-
.
III. Title.
QA39.2.C733
1988
510--dc19
88-3779
CIP
Sixth printing, with corrections, October 1990
Copyright @ 1989 by Addison-Wesley Publishing Company
All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system or transmitted, in any form or by any means, electronic, mechani-
cal,
photocopying, recording, or otherwise, without the prior written permission of
the publisher. Printed in the United States of America. Published simultaneously
in Canada.
FGHIJK-HA-943210
Preface
“A
odience, level,
and treatment
-
a description of
such matters is
what prefaces are
supposed to be
about.”
-
P. R. Halmos
11421
“People do
acquire
a little brief author-
ity
by
equipping
themselves with
jargon:
they can
pontificate
and
air
a
superficial
expertise.
But
what we should
ask of educated
mathematicians is
not
what
they can
speechify
about,
nor even what they
know
about
the
existing corpus
of mathematical
knowledge, but
rather what
can
they now do with
their learning and
whether
they can
actually solve
math-
ematical
problems
arising in practice.
In short, we look for
deeds not words.”
-
J.
Hammersley
[145]
THIS BOOK IS BASED on a course of the same name that has been taught
annually at Stanford University since 1970. About fifty students have taken it
each year-juniors and seniors, but mostly graduate students-and alumni
of these classes have begun to spawn similar courses elsewhere. Thus the time
seems ripe to present the material to a wider audience (including sophomores).
It was a dark and stormy decade when Concrete Mathematics was born.
Long-held values were constantly being questioned during those turbulent
years; college campuses were hotbeds of controversy. The college curriculum
itself was challenged, and mathematics did not escape scrutiny. John Ham-
mersley had just written a thought-provoking article “On the enfeeblement of
mathematical skills by ‘Modern Mathematics’ and by similar soft intellectual
trash in schools and universities”
[145];
other worried mathematicians
[272]
even asked, “Can mathematics be saved?” One of the present authors had
embarked on a series of books called The Art of Computer Programming, and
in writing the first volume he (DEK) had found that there were mathematical
tools missing from his repertoire; the mathematics he needed for a thorough,
well-grounded understanding of computer programs was quite different from
what he’d learned as a mathematics major in college. So he introduced a new
course, teaching what he wished somebody had taught him.
The course title “Concrete Mathematics” was originally intended as an
antidote to “Abstract Mathematics,” since concrete classical results were rap-
idly being swept out of the modern mathematical curriculum by a new wave
of abstract ideas popularly called the “New Math!’ Abstract mathematics is a
wonderful subject, and there’s nothing wrong with it: It’s beautiful, general,
and useful. But its adherents had become deluded that the rest of mathemat-
ics was inferior and no longer worthy of attention. The goal of generalization
had become so fashionable that a generation of mathematicians had become
unable to relish beauty in the particular, to enjoy the challenge of solving
quantitative problems, or to appreciate the value of technique. Abstract math-
ematics was becoming inbred and losing touch with reality; mathematical ed-
ucation needed a concrete counterweight in order to restore a healthy balance.
When DEK taught Concrete Mathematics at Stanford for the first time,
he explained the somewhat strange title by saying that it was his attempt
V
vi PREFACE
to teach a math course that was hard instead of soft. He announced that,
contrary to the expectations of some of his colleagues, he was not going to
teach the Theory of Aggregates, nor Stone’s Embedding Theorem, nor even
the Stone-Tech compactification. (Several students from the civil engineering
department got up and quietly left the room.)
Although Concrete Mathematics began as a reaction against other trends,
the main reasons for its existence were positive instead of negative. And as
the course continued its popular place in the curriculum, its subject matter
“solidified” and proved to be valuable in a variety of new applications. Mean-
while, independent confirmation for the appropriateness of the name came
from another direction, when Z. A. Melzak published two volumes entitled
Companion to Concrete Mathematics
[214].
The material of concrete mathematics may seem at first to be a disparate
bag of tricks, but practice makes it into a disciplined set of tools. Indeed, the
techniques have an underlying unity and a strong appeal for many people.
When another one of the authors (RLG) first taught the course in 1979, the
students had such fun that they decided to hold a class reunion a year later.
But what exactly is Concrete Mathematics? It is a blend of continuous
and
diSCRETE
mathematics. More concretely, it is the controlled manipulation
of mathematical formulas, using a collection of techniques for solving prob-
lems. Once you, the reader, have learned the material in this book, all you
will need is a cool head, a large sheet of paper, and fairly decent handwriting
in order to evaluate horrendous-looking sums, to solve complex recurrence
relations, and to discover subtle patterns in data. You will be so fluent in
algebraic techniques that you will often find it easier to obtain exact results
than to settle for approximate answers that are valid only in a limiting sense.
The major topics treated in this book include sums, recurrences, ele-
mentary number theory, binomial coefficients, generating functions, discrete
probability, and asymptotic methods. The emphasis is on manipulative tech-
nique rather than on existence theorems or combinatorial reasoning; the goal
is for each reader to become as familiar with discrete operations (like the
greatest-integer function and finite summation) as a student of calculus is
familiar with continuous operations (like the absolute-value function and in-
finite integration).
Notice that this list of topics is quite different from what is usually taught
nowadays in undergraduate courses entitled “Discrete Mathematics!’ There-
fore the subject needs a distinctive name, and “Concrete Mathematics” has
proved to be as suitable as any other.
The original textbook for Stanford’s course on concrete mathematics was
the “Mathematical Preliminaries” section in
The
Art of Computer Program-
ming
[173].
But the presentation in those 110 pages is quite terse, so another
author (OP) was inspired to draft a lengthy set of supplementary notes. The
“The heart of math-
ematics consists
of concrete exam-
ples and concrete
problems.
-P.
R. Halmos
11411
“lt is downright
sinful to teach the
abstract before the
concrete.
-Z. A. Melzak
12141
Concrete Ma the-
matics is a bridge
to abstract mathe-
matics.
“The advanced
reader who skips
parts that appear
too elementary may
miss
more
than
the less advanced
reader who skips
parts that appear
too complex.
-G.
Pdlya
[238]
(We’re not bold
enough
to try
Distinuous Math-
ema tics.)
‘I
a concrete
life preserver
thrown to students
sinking in a sea of
abstraction.”
-
W.
Gottschalk
Math graffiti:
Kilroy wasn’t Haar.
Free the group.
Nuke the kernel.
Power to the n.
N=l
j
P=NP.
I have only a
marginal interest
in this subject.
This was the most
enjoyable course
I’ve ever had. But
it might be nice
to summarize the
material as you
go along.
PREFACE vii
present book is an outgrowth of those notes; it is an expansion of, and a more
leisurely introduction to, the material of Mathematical Preliminaries. Some of
the more advanced parts have been omitted; on the other hand, several topics
not found there have been included here so that the story will be complete.
The authors have enjoyed putting this book together because the subject
began to jell and to take on a life of its own before our eyes; this book almost
seemed to write itself. Moreover, the somewhat unconventional approaches
we have adopted in several places have seemed to fit together so well, after
these years of experience, that we can’t help feeling that this book is a kind
of manifesto about our favorite way to do mathematics. So we think the book
has turned out to be a tale of mathematical beauty and surprise, and we hope
that our readers will share at least
E
of the pleasure we had while writing it.
Since this book was born in a university setting, we have tried to capture
the spirit of a contemporary classroom by adopting an informal style. Some
people think that mathematics is a serious business that must always be cold
and dry; but we think mathematics is fun, and we aren’t ashamed to admit
the fact. Why should a strict boundary line be drawn between work and
play? Concrete mathematics is full of appealing patterns; the manipulations
are not always easy, but the answers can be astonishingly attractive. The
joys and sorrows of mathematical work are reflected explicitly in this book
because they are part of our lives.
Students always know better than their teachers, so we have asked the
first students of this material to contribute their frank opinions, as “grafhti”
in the margins. Some of these marginal markings are merely corny, some
are profound; some of them warn about ambiguities or obscurities, others
are typical comments made by wise guys in the back row; some are positive,
some are negative, some are zero. But they all are real indications of feelings
that should make the text material easier to assimilate. (The inspiration for
such marginal notes comes from a student handbook entitled Approaching
Stanford, where the official university line is counterbalanced by the remarks
of outgoing students. For example, Stanford says, “There are a few things
you cannot miss in this amorphous shape which is Stanford”; the margin
says, “Amorphous . . . what the
h***
does that mean? Typical of the
pseudo-
intellectualism around here.” Stanford: “There is no end to the potential of
a group of students living together.” Grafhto: “Stanford dorms are like zoos
without a keeper.“)
The margins also include direct quotations from famous mathematicians
of past generations, giving the actual words in which they announced some
of their fundamental discoveries. Somehow it seems appropriate to mix the
words of Leibniz, Euler, Gauss, and others with those of the people who
will be continuing the work. Mathematics is an ongoing endeavor for people
everywhere; many strands are being woven into one rich fabric.
I’m unaccustomed
to this face.
Dear
prof:
Thanks
for (1) the
puns,
(2) the subject
matter.
1
don’t see
how
what I’ve learned
will
ever
help me.
I
bad a lot of
trou-
ble in this class, but
I
know it
sharpened
my math
skills
and
my
thinking
skills.
1
would
advise
the
casual student to
stay
away
from this
course.
PREFACE ix
or chalk. (For example, one of the trademarks of the new design is the symbol
for zero, ‘0’, which is slightly pointed at the top because a handwritten zero
rarely closes together smoothly when the curve returns to its starting point.)
The letters are upright, not italic, so that subscripts, superscripts, and ac-
cents are more easily fitted with ordinary symbols. This new type family has
been named AM.9 Euler, after the great Swiss mathematician Leonhard Euler
(1707-1783) who discovered so much of mathematics as we know it today.
The alphabets include Euler Text (Aa Bb Cc through Xx Yy Zz), Euler Frak-
tur
(%a23236
cc through Q’$lu
3,3),
and Euler Script Capitals (A’B e through
X
y Z), as well as Euler Greek
(AOL
B
fi
ry through
XXY’J,
nw) and special
symbols such as p and K. We are especially pleased to be able to inaugurate
the Euler family of typefaces in this book, because Leonhard Euler’s spirit
truly lives on every page: Concrete mathematics is Eulerian mathematics.
The authors are extremely grateful to Andrei Broder, Ernst Mayr, An-
drew Yao, and Frances Yao, who contributed greatly to this book during the
years that they taught Concrete Mathematics at Stanford. Furthermore we
offer 1024 thanks to the teaching assistants who creatively transcribed what
took place in class each year and who helped to design the examination ques-
tions; their names are listed in Appendix C. This book, which is essentially
a compendium of sixteen years’ worth of lecture notes, would have been im-
possible without their first-rate work.
Many other people have helped to make this book a reality. For example,
we wish to commend the students at Brown, Columbia, CUNY, Princeton,
Rice, and Stanford who contributed the choice graffiti and helped to debug
our first drafts. Our contacts at Addison-Wesley were especially efficient
and helpful; in particular, we wish to thank our publisher (Peter Gordon),
production supervisor (Bette Aaronson), designer (Roy Brown), and copy ed-
itor (Lyn Dupre). The National Science Foundation and the Office of Naval
Research have given invaluable support. Cheryl Graham was tremendously
helpful as we prepared the index. And above all, we wish to thank our wives
(Fan, Jill, and Amy) for their patience, support, encouragement, and ideas.
We have tried to produce a perfect book, but we are imperfect authors.
Therefore we solicit help in correcting any mistakes that we’ve made. A re-
ward of $2.56 will gratefully be paid to the first finder of any error, whether
it is mathematical, historical, or typographical.
Murray Hill, New Jersey
-RLG
and Stanford, California
DEK
May 1988
OP
n
[
1
n-l
n
{I
m
n
0
m
n
Prestressed concrete
mathematics is con-
(i
>>
m
Crete
mathematics
that’s preceded by
(‘h...%)b
a bewildering list
of notations.
K(al,.
. .
,a,)
F
#A
iz”l
f(z)
la..@1
[m=nl
[m\nl
Im\nl
[m-l-n1
A NOTE ON NOTATION xi
Stirling cycle number (the “first kind”) 245
Stirling subset number (the “second kind”) 244
Eulerian number 253
Second-order Eulerian number
256
radix notation for
z,“=,
akbk
11
continuant polynomial 288
hypergeometric function
205
cardinality: number of elements in the set A 39
coefficient of zn in f
(2)
197
closed interval: the set {x 1 016 x 6
(3}
73
1 if m = n, otherwise 0 * 24
1 if m divides n, otherwise 0 *
102
1 if m exactly divides n, otherwise 0 *
146
1 if m is relatively prime to n, otherwise 0 *
115
*In general, if S is any statement that can be true or false, the bracketed
notation
[S]
stands for 1 if S is true, 0 otherwise.
Throughout this text, we use single-quote marks
(‘.
. .
‘)
to delimit text as
it is written, double-quote marks (“. .
“)
for a phrase as it is spoken. Thus,
Also ‘nonstring’ is
the string of letters ‘string’ is sometimes called a “string!’
a string.
An expression of the form ‘a/be’ means the same as ‘a/(bc)‘. Moreover,
logx/logy = (logx)/(logy) and 2n! = 2(n!).
Contents
1
Recurrent Problems
1
1.1 The Tower of Hanoi
1
1.2 Lines in the Plane 4
1.3 The
Josephus
Problem
8
Exercises 17
2 Sums
2.1 Notation 21
2.2 Sums and Recurrences 25
2.3 Manipulation of Sums 30
2.4 Multiple Sums 34
2.5 General Methods 41
2.6 Finite and Infinite Calculus
47
2.7 Infinite Sums 56
Exercises 62
21
3 Integer Functions
67
3.1 Floors and Ceilings 67
3.2 Floor/Ceiling Applications 70
3.3 Floor/Ceiling Recurrences 78
3.4
‘mod’:
The Binary Operation
81
3.5 Floor/Ceiling Sums 86
Exercises 95
4 Number Theory 102
4.1 Divisibility 102
4.2 Primes 105
4.3 Prime Examples 107
4.4 Factorial Factors 111
4.5 Relative Primality 115
4.6 ‘mod’: The Congruence Relation
123
4.7 Independent Residues 126
4.8 Additional Applications 129
4.9 Phi and Mu 133
Exercises 144
5 Binomial Coefficients
153
5.1 Basic Identities 153
5.2 Basic Practice 172
xii
CONTENTS xiii
5.3 Tricks of the Trade 186
5.4 Generating Functions 196
5.5 Hypergeometric Functions 204
5.6 Hypergeometric Transformations 216
5.7 Partial Hypergeometric Sums 223
Exercises 230
6 Special Numbers
243
6.1 Stirling Numbers 243
6.2 Eulerian Numbers 253
6.3 Harmonic Numbers 258
6.4 Harmonic Summation 265
6.5 Bernoulli Numbers 269
6.6 Fibonacci Numbers 276
6.7 Continuants 287
Exercises 295
7 Generating Functions
306
7.1 Domino Theory and Change 306
7.2 Basic Maneuvers 317
7.3 Solving Recurrences 323
7.4 Special Generating Functions 336
7.5 Convolutions 339
7.6 Exponential Generating Functions 350
7.7 Dirichlet Generating Functions 356
Exercises 357
8 Discrete Probability
367
8.1 Definitions 367
8.2 Mean and Variance 373
8.3 Probability Generating Functions 380
8.4 Flipping Coins 387
8.5 Hashing 397
Exercises 413
9 Asymptotics
425
9.1 A Hierarchy 426
9.2 0 Notation 429
9.3 0 Manipulation 436
9.4 Two Asymptotic Tricks 449
9.5 Euler’s Summation Formula 455
9.6 Final Summations 462
Exercises 475
A Answers to Exercises
483
B Bibliography
578
C Credits for Exercises
601
Index 606
List of Tables
624
Recurrent Problems
THIS CHAPTER EXPLORES three sample problems that give a feel for
what’s to come. They have two traits in common: They’ve all been investi-
gated repeatedly by mathematicians; and their solutions all use the idea of
recuvexe,
in which the solution to each problem depends on the solutions
to smaller instances of the same problem.
Raise your hand
if you’ve never
seen this.
OK, the rest of
you can cut to
equation (1.1).
1.1
THE TOWER OF HANOI
Let’s look first at a neat little puzzle called the Tower of Hanoi,
invented by the French mathematician Edouard Lucas in 1883. We are given
a tower of eight disks, initially stacked in decreasing size on one of three pegs:
The objective is to transfer the entire tower to one of the other pegs, moving
only one disk at a time and never moving a larger one onto a smaller.
Lucas
[208]
furnished his toy with a romantic legend about a much larger
Gold -wow.
Tower of Brahma, which supposedly has 64 disks of pure gold resting on three
Are our disks made
of concrete?
diamond needles. At the beginning of time, he said, God placed these golden
disks on the first needle and ordained that a group of priests should transfer
them to the third, according to the rules above. The priests reportedly work
day and night at their task. When they finish, the Tower will crumble and
the world will end.
1
2 RECURRENT PROBLEMS
It’s not immediately obvious that the puzzle has a solution, but a little
thought (or having seen the problem before) convinces us that it does. Now
the question arises: What’s the best we can do? That is, how many moves
are necessary and sufficient to perform the task?
The best way to tackle a question like this is to generalize it a bit. The
Tower of Brahma has 64 disks and the Tower of Hanoi has 8; let’s consider
what happens if there are n disks.
One advantage of this generalization is that we can scale the problem
down even more. In fact, we’ll see repeatedly in this book that it’s advanta-
geous to LOOK AT SMALL CASES first. It’s easy to see how to transfer a tower
that contains only one or two disks. And a small amount of experimentation
shows how to transfer a tower of three.
The next step in solving the problem is to introduce appropriate notation:
NAME AND CONQUER. Let’s say that
T,,
is the minimum number of moves
that will transfer n disks from one peg to another under Lucas’s rules. Then
Tl
is obviously
1,
and
T2
= 3.
We can also get another piece of data for free, by considering the smallest
case of all: Clearly
TO
= 0, because no moves at all are needed to transfer a
tower of n = 0 disks! Smart mathematicians are not ashamed to think small,
because general patterns are easier to perceive when the extreme cases are
well understood (even when they are trivial).
But now let’s change our perspective and try to think big; how can we
transfer a large tower? Experiments with three disks show that the winning
idea is to transfer the top two disks to the middle peg, then move the third,
then bring the other two onto it. This gives us a clue for transferring n disks
in general: We first transfer the n
-
1 smallest to a different peg (requiring
T,-l
moves), then move the largest (requiring one move), and finally transfer
the n- 1 smallest back onto the largest (requiring another
Tn..1
moves). Thus
we can transfer n disks (for n > 0) in at most
2T,-,
+ 1 moves:
T,
6
2Tn-1
+ 1 ,
for n > 0.
This formula uses
<
instead of
=
because our construction proves only
that 2T+1 + 1 moves suffice; we haven’t shown that 2T,_, + 1 moves are
necessary. A clever person might be able to think of a shortcut.
But is there a better way? Actually no. At some point we must move the
largest disk. When we do, the n
-
1 smallest must be on a single peg, and it
has taken at least T,_, moves to put them there. We might move the largest
disk more than once, if we’re not too alert. But after moving the largest disk
for the last time, we must transfer the n- 1 smallest disks (which must again
be on a single peg) back onto the largest; this too requires
T,-
1 moves. Hence
Most of the pub-
lished “solutions”
to Lucas’s problem,
like the early one
of Allardice and
Fraser
[?I,
fail to ex-
plain why
T,,
must
be 3
2T,,
1 + 1.
Tn
3
2Tn-1
+ 1 ,
for n > 0.
Yeah, yeah.
lseen
that
word
before.
Mathematical in-
duction proves that
we can climb as
high as we like on
a
ladder, by proving
that we can
climb
onto the bottom
rung (the basis)
and that from each
rung we can climb
up to the next one
(the induction).
1.1 THE TOWER OF HANOI 3
These two inequalities, together with the trivial solution for n = 0, yield
To
=O;
T,
=
2T+1
+l
,
for n > 0.
(1.1)
(Notice that these formulas are consistent with the known values
TI
= 1 and
Tz
= 3. Our experience with small cases has not only helped us to discover
a general formula, it has also provided a convenient way to check that we
haven’t made a foolish error. Such checks will be especially valuable when we
get into more complicated maneuvers in later chapters.)
A set of equalities like (1.1) is called a recurrence (a.k.a. recurrence
relation or recursion relation). It gives a boundary value and an equation for
the general value in terms of earlier ones. Sometimes we refer to the general
equation alone as a recurrence, although technically it needs a boundary value
to be complete.
The recurrence allows us to compute
T,,
for any n we like. But nobody
really likes to compute from a recurrence, when n is large; it takes too long.
The recurrence only gives indirect,
“local” information. A solution to the
recurrence would make us much happier. That is, we’d like a nice, neat,
“closed form” for
T,,
that lets us compute it quickly, even for large n. With
a closed form, we can understand what
T,,
really is.
So how do we solve a recurrence? One way is to guess the correct solution,
then to prove that our guess is correct. And our best hope for guessing
the solution is to look (again) at small cases. So we compute, successively,
T~=2~3+1=7;T~=2~7+1=15;T~=2~15+1=31;T~=2~31+1=63.
Aha! It certainly looks as if
T,
=
2n-1,
for n 3 0.
(1.2)
At least this works for n < 6.
Mathematical induction is a general way to prove that some statement
about the integer n is true for all n 3 no. First we prove the statement
when n has its smallest value, no; this is called the basis. Then we prove the
statement for n > no, assuming that it has already been proved for all values
between no and n
-
1,
inclusive; this is called the induction. Such a proof
gives infinitely many results with only a finite amount of work.
Recurrences are ideally set up for mathematical induction. In our case,
for example, (1.2) follows easily from (1.1): The basis is trivial, since
TO
=
2’
-
1 = 0. And the induction follows for n > 0 if we assume that (1.2) holds
when n is replaced by n
-
1:
T,,
=
2T,,
,
$1
=
2(2
nl
-l)+l
=
2n-l.
Hence (1.2) holds for n as well. Good! Our quest for
T,,
has ended successfully.
4 RECURRENT PROBLEMS
Of course the priests’ task hasn’t ended; they’re still dutifully moving
disks, and will be for a while, because for n = 64 there are
264-l
moves (about
18 quintillion). Even at the impossible rate of one move per microsecond, they
will need more than 5000 centuries to transfer the Tower of Brahma. Lucas’s
original puzzle is a bit more practical, It requires
28
-
1 = 255 moves, which
takes about four minutes for the quick of hand.
The Tower of Hanoi recurrence is typical of many that arise in applica-
tions of all kinds. In finding a closed-form expression for some quantity of
interest like
T,,
we go through three stages:
1
Look at small cases. This gives us insight into the problem and helps us
in stages 2 and 3.
2
Find and prove a mathematical expression for the quantity of interest.
What is a proof?
For the Tower of Hanoi, this is the recurrence (1.1) that allows us, given
“One
ha’fofone
the inclination, to compute
T,,
for any n.
percent
pure alco-
hol.
3
Find and prove a closed form for our mathematical expression. For the
Tower of Hanoi, this is the recurrence solution
(1.2).
The third stage is the one we will concentrate on throughout this book. In
fact, we’ll frequently skip stages 1 and 2 entirely, because a mathematical
expression will be given to us as a starting point. But even then, we’ll be
getting into subproblems whose solutions will take us through all three stages.
Our analysis of the Tower of Hanoi led to the correct answer, but it
required an “inductive leap”;
we relied on a lucky guess about the answer.
One of the main objectives of this book is to explain how a person can solve
recurrences without being clairvoyant. For example, we’ll see that recurrence
(1.1) can be simplified by adding 1 to both sides of the equations:
To
+ 1 = 1;
Lsl
=2T,-,
+2,
for n > 0.
Now if we let
U,
=
T,,
+
1,
we have
uo = 1
;
u,
=
2&-l,
for n > 0.
Interesting: We get
rid of the
+l
in
(1.1) by adding, not
(1.3) by subtracting.
It doesn’t take genius to discover that the solution to this recurrence is just
U,
= 2”; hence T, = 2”
-
1. Even a computer could discover this.
1.2 LINES IN THE PLANE
Our second sample problem has a more geometric flavor: How many
slices of pizza can a person obtain by making n straight cuts with a pizza
knife? Or, more academically: What is the maximum number L, of regions
1.2 LINES IN THE PLANE 5
(A pizza with Swiss
cheese?)
A region is convex
if it includes all
line segments
be-
tween any two of its
points. (That’s not
what my dictionary
says, but it’s what
mathematicians
believe.)
defined by n lines in the plane? This problem was first solved in 1826, by the
Swiss mathematician Jacob Steiner
[278].
Again we start by looking at small cases, remembering to begin with the
smallest of all. The plane with no lines has one region; with one line it has
two regions; and with two lines it has four regions:
(Each line extends infinitely in both directions.)
Sure, we think,
L,
= 2”; of course! Adding a new line simply doubles
the number of regions. Unfortunately this is wrong. We could achieve the
doubling if the nth line would split each old region in two; certainly it can
split an old region in at most two pieces, since each old region is convex. (A
straight line can split a convex region into at most two new regions, which
will also be convex.) But when we add the third line-the thick one in the
diagram below- we soon find that it can split at most three of the old regions,
no matter how we’ve placed the first two lines:
Thus
L3
= 4 + 3 = 7 is the best we can do.
And after some thought we realize the appropriate generalization. The
nth line (for n > 0) increases the number of regions by k if and only if it
splits k of the old regions, and it splits k old regions if and only if it hits the
previous lines in k- 1 different places. Two lines can intersect in at most one
point. Therefore the new line can intersect the n- 1 old lines in at most n- 1
different points, and we must have k 6 n. We have established the upper
bound
L
6
L-1
+n,
for n > 0.
Furthermore it’s easy to show by induction that we can achieve equality in
this formula. We simply place the nth line in such a way that it’s not parallel
to any of the others (hence it intersects them all), and such that it doesn’t go
6 RECURRENT PROBLEMS
through any of the existing intersection points (hence it intersects them all
in different places). The recurrence is therefore
Lo
= 1;
L, = L,-l
+n,
for n > 0.
(1.4)
The known values of
L1
,
Lz,
and
L3
check perfectly here, so we’ll buy this.
Now we need a closed-form solution. We could play the guessing game
again, but
1,
2, 4, 7,
11,
16, . . . doesn’t look familiar; so let’s try another
tack. We can often understand a recurrence by “unfolding” or “unwinding”
it all the way to the end, as follows:
L, = L,_j + n
=
L,-z+(n-l)+n
=
LnP3
+ (n
-
2) + (n
-
1) + n
Unfolding?
I’d call this
“plugging in.”
= Lo+1
+2+...
+ (n
-
2) + (n
-
1) + n
=
1
+
s,,
where S, = 1 + 2 + 3 + . . + (n
-
1) + n.
In other words, L, is one more than the sum S, of the first n positive integers.
The quantity S, pops up now and again, so it’s worth making a table of
small values. Then we might recognize such numbers more easily when we
see them the next time:
n
1
2 3
4
5 6 7 8 9 10
11
12 13 14
S,
1
3 6 10 15 21
28 36 45 55 66 78
91 105
These values are also called the triangular numbers, because S, is the number
of bowling pins in an n-row triangular array. For example, the usual four-row
array
‘*:::*’
has
Sq
=
10
pins.
To evaluate S, we can use a trick that Gauss reportedly came up with
in 1786, when he was nine years old
[73]
(see also Euler
[92,
part 1,
$4151):
It seems
a lot
of
stuff is attributed
s,=
1 + 2 + 3
+...+ (n-l) + n
to Gauss-
either he was really
+Sn=
n
+ (n-l) + (n-2) + ... + 2 + 1
smart or he had
a
2S, =
(n+l)
+
(n+l)
+
(n+l)
+...+
(n+1)
+
(n+l)
great press agent.
Maybe
he
just
We merely add S, to its reversal, so that each of the n columns on the right
sums to n +
1.
Simplifying,
~~~s~n~,!~etic
s
_
n(n+l)
n-
2
for n
3
0.
(1.5)
Actually Gauss is
often called the
greatest mathe-
matician of all time.
So it’s nice to be
able to
understand
at least one of his
discoveries.
When in
doubt,
look at the words.
Why is it Vlosed,”
as opposed to
L’open”?
What
image does it bring
to mind?
Answer: The
equa-
tion is
“closed
not
defined
in
ter;s
of
itself-not leading
to recurrence.
The
case is “closed” -it
won’t happen again.
Metaphors
are the
key.
Is
“zig” a technical
term?
1.2 LINES IN THE PLANE 7
OK, we have our solution:
L
n
=
n(n+‘)
$1
2
)
for n 3 0.
(1.6)
As experts, we might be satisfied with this derivation and consider it
a proof, even though we waved our hands a bit when doing the unfolding
and reflecting. But students of mathematics should be able to meet stricter
standards; so it’s a good idea to construct a rigorous proof by induction. The
key induction step is
L, =
L,-lfn
=
(t(n-l)n+l)+n
=
tn(n+l)+l.
Now there can be no doubt about the,closed form (1.6).
Incidentally we’ve been talking about “closed forms” without explic-
itly saying what we mean. Usually it’s pretty clear. Recurrences like (1.1)
and (1.4) are not in closed form- they express a quantity in terms of itself;
but solutions like
(1.2)
and (1.6) are. Sums like 1 + 2 + . . . + n are not in
closed form- they cheat by using
. . .
‘;
but expressions like n(n + 1)/2 are.
We could give a rough definition like this: An expression for a quantity f(n)
is in closed form if we can compute it using at most a fixed number of “well
known” standard operations, independent of n. For example, 2”
-
1 and
n(n + 1)/2 are closed forms because they involve only addition, subtraction,
multiplication, division, and exponentiation, in explicit ways.
The total number of simple closed forms is limited, and there are recur-
rences that don’t have simple closed forms. When such recurrences turn out
to be important, because they arise repeatedly, we add new operations to our
repertoire; this can greatly extend the range of problems solvable in “simple”
closed form. For example, the product of the first n integers, n!, has proved
to be so important that we now consider it a basic operation. The formula
‘n!’
is therefore in closed form, although its equivalent ‘1
.2..
. . .n’ is not.
And now, briefly, a variation of the lines-in-the-plane problem: Suppose
that instead of straight lines we use bent lines, each containing one “zig!’
What is the maximum number
Z,
of regions determined by n such bent lines
in the plane? We might expect
Z,
to be about twice as big as L,, or maybe
three times as big. Let’s see:
<
2
1
8 RECURRENT PROBLEMS
From these small cases, and after a little thought, we realize that a bent
. . and a little
line is like two straight lines except that regions merge when the “two” lines
afterthought...
don’t extend past their intersection point.
.
4
.
.
3
.:::
1
. .
.
. .
(=:
2
Regions 2, 3, and 4, which would be distinct with two lines, become a single
region when there’s a bent line; we lose two regions. However, if we arrange
things properly-the zig point must lie “beyond” the intersections with the
other lines-that’s all we lose; that is, we lose only two regions per line. Thus
Exercise 18 has the
details.
Z,
=
Lz,-2n
=
2n(2n+1)/2+1-2n
=
2n2-n+l,
for
n
3
0.
(1.7)
Comparing the closed forms (1.6) and (1.7), we find that for large n,
L,
N
in’,
Z,
-
2n2;
so we get about four times as many regions with bent lines as with straight
lines. (In later chapters we’ll be discussing how to analyze the approximate
behavior of integer functions when n is large.)
1.3 THE JOSEPHUS PROBLEM
Our final introductory example is a variant of an ancient problem
named for Flavius Josephus, a famous historian of the first century. Legend
has it that Josephus wouldn’t have lived to become famous without his math-
ematical talents. During the Jewish-Roman war, he was among a band of 41
Jewish rebels trapped in a cave by the Romans. Preferring suicide to capture,
the rebels decided to form a circle and, proceeding around it, to kill every
third remaining person until no one was left. But Josephus, along with an
unindicted co-conspirator, wanted none of this suicide nonsense; so he quickly
calculated where he and his friend should stand in the vicious circle.
In our variation, we start with n people numbered 1 to n around a circle,
and we eliminate every second remaining person until only one survives. For
(Ahrens
15,
vol.
21
and
Herstein
and Kaplansky
11561
discuss the interest-
ing history of this
problem.
Josephus
himself [ISS] is a bit
vague.)
.
thereby saving
his tale for us to
hear.
1.3 THE JOSEPHUS PROBLEM 9
Here’s a case where
n = 0 makes no
sense.
Even so, a bad
guess isn’t a waste
of time, because it
gets us involved in
the problem.
This is the tricky
part: We have
J(2n)
=
newnumber(J(n)),
where
newnumber( k) =
2k-1.
example, here’s the starting configuration for n = 10:
9 3
8
4
The elimination order is 2, 4, 6, 8, 10, 3, 7, 1, 9, so 5 survives. The problem:
Determine the survivor’s number, J(n).
We just saw that J(l0) = 5. We might conjecture that J(n) = n/2 when
n is even; and the case n = 2 supports the conjecture: J(2) = 1. But a few
other small cases dissuade us-the conjecture fails for n = 4 and n = 6.
n 123456
J(n) 1 1 3 1 3 5
It’s back to the drawing board; let’s try to make a better guess. Hmmm . . .
J(n) always seems to be odd. And in fact, there’s a good reason for this: The
first trip around the circle eliminates all the even numbers. Furthermore, if
n itself is an even number, we arrive at a situation similar to what we began
with, except that there are only half as many people, and their numbers have
changed.
So let’s suppose that we have 2n people originally. After the first
go-
round, we’re left with
2n-1
'3
2n-3
0
t
5
7
and 3 will be the next to go. This is just like starting out with n people, except
that each person’s number has been doubled and decreased by
1.
That is,
JVn)
=
2J(n)
-
1
,
for n 3 1
We can now go quickly to large n. For example, we know that J( 10) = 5, so
J(20) = 2J(lO)
-
1 =
2.5-
1 = 9
Similarly J(40) = 17, and we can deduce that J(5.2”‘) = 2m+’ + 1
10 RECURRENT PROBLEMS
But what about the odd case? With 2n + 1 people, it turns out that
Odd case? Hey,
person number 1 is wiped out just after person number 2n, and we’re left with
leave
mY
brother
out of it.
2n+l
3
5
2n-1
0
t
7
9
Again we almost have the original situation with n people, but this time their
numbers are doubled and increased by 1. Thus
J(2n-t 1) = 2J(n) + 1 ,
for n > 1.
Combining these equations with J( 1) = 1 gives us a recurrence that defines J
in all cases:
J(1)
=
1
;
J(2n) =
2J(n)
-
1 ,
for n >
1;
(1.8)
J(2n + 1) = 2J(n) +
1
,
for n 3
1.
Instead of getting J(n) from J(n- l), this recurrence is much more “efficient,”
because it reduces n by a factor of 2 or more each time it’s applied. We could
compute J(
lOOOOOO),
say, with only 19 applications of (1.8). But still, we seek
a closed form, because that will be even quicker and more informative. After
all, this is a matter of life or death.
Our recurrence makes it possible to build a table of small values very
quickly. Perhaps we’ll be able to spot a pattern and guess the answer.
n
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
J(n) 1 1 3 1 3 5 7 1 3 5 7 9 11 13 15 1
Voild!
It seems we can group by powers of 2 (marked by vertical lines in
the table); J(n is always 1 at the beginning of a group and it increases by 2
)
within a group. So if we write n in the form n = 2” + 1, where 2m is the
largest power of 2 not exceeding n and where
1
is what’s left, the solution to
our recurrence seems to be
J(2” +
L)
=
2Lf
1
,
for m 3 0 and 0 6 1<
2m.
(1.9)
(Notice that if 2” 6 n
< 2 mt’ , the remainder
1
= n
-
2” satisfies
0
6
1
<
2m+’
-
2m
=
I”.)
We must now prove (1.9). As in the past we use induction, but this time
the induction is on m. When m = 0 we must have
1
= 0; thus the basis of
But there’s a sim-
pler way! The
key fact is that
J(2”)
= 1 for
all
m,
and this
follows immedi-
ately from our first
equation,
J(2n)
=
2J(n)-1.
Hence we know that
the first person will
survive whenever
n isapowerof2.
And in the gen-
eral case, when
n =
2”+1,
the number of
people is reduced
to a power of 2
after there have
been
1
executions.
The first remaining
person at this point,
the survivor, is
number
21+
1 .
1.3 THE JOSEPHUS PROBLEM
11
(1.9) reduces to J(1) =
1,
which is true. The induction step has two parts,
depending on whether
1
is even or odd. If m > 0 and
2”’
+
1=
2n, then
1
is
even and
J(2”
+
1)
=
2J(2”-’
+
l/2)
-
1 = 2(21/2 + 1)
-
1 =
21f
1 ,
by (1.8) and the induction hypothesis; this is exactly what we want. A similar
proof works in the odd case, when 2” +
1=
2n + 1. We might also note that
(1.8) implies the relation
J(2nf
1)
-
J(2n) = 2.
Either way, the induction is complete and (1.9) is established.
To illustrate solution (l.g), let’s compute J( 100). In this case we have
100 =
26
+ 36, so J(100) = 2.36 + 1 = 73.
Now that we’ve done the hard stuff (solved the problem) we seek the
soft: Every solution to a problem can be generalized so that it applies to a
wider class of problems. Once we’ve learned a technique, it’s instructive to
look at it closely and see how far we can go with it. Hence, for the rest of this
section, we will examine the solution (1.9) and explore some generalizations
of the recurrence (1.8). These explorations will uncover the structure that
underlies all such problems.
Powers of 2 played an important role in our finding the solution, so it’s
natural to look at the radix 2 representations of n and J(n). Suppose n’s
binary expansion is
n =
(b,
b,-l
. .
bl
bo)z
;
that is,
n =
b,2”
+ bmP12mP’ +
...
+ b12 +
bo,
where each
bi
is either 0 or 1 and where the leading bit
b,
is 1. Recalling
that n = 2” +
1,
we have, successively,
n =
(lbm~lbm~.2...blbo)2,
1
= (0
b,pl
b,p2..
.
bl
b0)2 ,
21 =
(b,p,
bmp2.. .
b,
b. 0)2,
21+
1 =
(b,p,
bmp2.. .
bl
b. 1
)2
,
J(n)
=
(bm-1
brn-2..
.bl
bo
brn)z.
(The last step follows because J(n) =
2l.+
1 and because
b,
= 1.) We have
proved that
J((bmbm--l
...bl
b0)2)
=
(brn-1
...bl
bobml2;
(1.10)
12 RECURRENT PROBLEMS
that is, in the lingo of computer programming, we get J(n) from n by doing
a one-bit cyclic shift left! Magic. For example, if n = 100 = (1
lOOlOO)
then
J(n)
=
J((1100100)~)
= (1001001)
2,
which is 64 + 8 + 1 = 73. If we had been
working all along in binary notation, we probably would have spotted this
pattern immediately.
If we start with n and iterate the J function m + 1 times, we’re doing
(“iteration” means
m + 1 one-bit cyclic shifts; so, since n is an
(mfl
)-bit number, we might
applying a function
expect to end up with n again. But this doesn’t quite work. For instance
to itself.)
if n = 13 we have J((1101)~) =
(1011)2,
but then J((1011)~) =
(111)~
and
the process breaks down; the 0 disappears when it becomes the leading bit.
In fact, J(n) must always be < n by definition, since J(n) is the survivor’s
number; hence if J(n) < n we can never get back up to n by continuing to
iterate.
Repeated application of J produces a sequence of decreasing values that
eventually reach a “fixed point,” where J(n) = n. The cyclic shift property
makes it easy to see what that fixed point will be: Iterating the function
enough times will always produce a pattern of all l's whose value is
2”(“)
-
1,
where y(n) is the number of 1 bits in the binary representation of n. Thus,
since Y( 13) = 3, we have
2
or more
I’s
j(r(.TTi(l3,...))
=
23-l
= 7;
similarly
8 or more
~((101101101101011)2)...)) = 2"
-
1 = 1023.
Curiously enough,
if M is a compact
C” n-manifold
(n
>
1),
there
exists a differen-
Cable immersion of
Luria
r*mm~
-IUS,
but true.
M
intO
R*”
~Ytnl
Let’s return briefly to our first guess, that J(n) = n/2 when n is even.
but not necessarily
into
~2”
vinl-1,
This is obviously not true in general, but we can now determine exactly when
1
wonder
if
Jose-
it is true:
phus was secretly
a topologist?
J(n)
=
n/2,
21+
1 = (2"+1)/2,
1
=
f(2”
-2).
If this number
1
=
i
(2”’
-
2) is an integer, then n = 2” +
1
will be a solution,
because
1
will be less than 2m. It’s not hard to verify that
2m
-2 is a multiple
of 3 when m is odd, but not when m is even. (We will study such things
in Chapter 4.) Therefore there are infinitely many solutions to the equation
1.3 THE JOSEPHUS PROBLEM 13
J(n) = n/2, beginning as follows:
m
1
n=2m+l
J(n) =
21f
1 = n/2 n (binary)
1
0
2
1
10
3 2
10 5
1010
5
10
42
21
101010
7
42 170
85 10101010
Notice the pattern in the rightmost column. These are the binary numbers
for which cyclic-shifting one place left produces the same result as ordinary-
shifting one place right (halving).
Looks like Greek
to me.
OK, we understand the J function pretty well; the next step is to general-
ize it. What would have happened if our problem had produced a recurrence
that was something like
(1.8),
but with different constants? Then we might
not have been lucky enough to guess the solution, because the solution might
have been really weird. Let’s investigate’this by introducing constants a,
6,
and y and trying to find a closed form for the more general recurrence
f(1) =
cc;
f(2n) = 2f(n) + fi,
for
n
3
1;
(1.11)
f(2n+1)=2f(n)+y,
for
n
3
1.
(Our original recurrence had a = 1,
fi
= -1, and y = 1.) Starting with
f (1) = a and working our way up, we can construct the following general
table for small values of n:
n
f(n)
la
2
2a-f
6
3201
+y
4
4af3f3
5
4a+28+
y
6
4a+
fi+2y
7 4a
+
3Y
8
8a+7p
9
8a+
6fl
+ y
(1.12)
It seems that a’s coefficient is n’s largest power of 2. Furthermore, between
powers of 2,
0’s
coefficient decreases by 1 down to 0 and y’s increases by 1
up from 0. Therefore if we express f(n) in the form
f(n) = A(n) a + B(n)
B
+ C(n)y ,
(1.13)
14 RECURRENT PROBLEMS
by separating out its dependence on
K,
/3,
and
y,
it seems that
A(n) =
2m;
B(n) =
2”‘-1-L;
(1.14)
C(n) =
1.
Here, as usual, n =
2m
+
1
and
0
<
1
<
2m,
for n 3
1.
It’s not terribly hard to prove (1.13) and (1.14) by induction, but the
Ho/d onto your
calculations are messy and uninformative. Fortunately there’s a better way
hats, this next part
to proceed, by choosing particular values and then combining them. Let’s
is new stuff.
illustrate this by considering the special case a =
1,
(3
= y = 0, when f(n) is
supposed to be equal to A(n): Recurrence
(1.11)
becomes
A(1) = 1;
A(2n) =
2A(‘n), for
n
3
1;
A(2n + 1) = 2A(n),
for
n
3
1.
Sure enough, it’s true (by induction on m) that A(2” +
1)
=
2m.
Next, let’s use recurrence
(1.11)
and solution (1.13) in
Teverse,
by start-
ing with a simple function f(n) and seeing if there are any constants
(OL,
8,
y)
that will define it. Plugging in the constant function f(n) = 1 says that
A
neat
idea!
1 = a;
1 = 2.1+p;
1 = 2.1+y;
hence the values (a,
6,
y)
= (1, -1, -1) satisfying these equations will yield
A(n)
-
B(n)
-
C(n) = f(n) = 1. Similarly, we can plug in f(n) = n:
1 =
a;
2n = 2+n+
L3;
2n+l
=
2.n+y;
These equations hold for all n when a =
1,
b
= 0, and y = 1, so we don’t
need to prove by induction that these parameters will yield f(n) = n. We
already know that f(n) = n will be the solution in such a case, because the
recurrence
(1.11)
uniquely defines f(n) for every value of
n.
And now we’re essentially done! We have shown that the functions A(n),
B(n), and C(n) of (1.13), which solve
(1.11)
in general, satisfy the equations
A(n) =
2”)
where n = 2” +
1
and 0 6
1
< 2”;
A(n) -B(n)
-
C(n) = 1
;
A(n) + C(n) = n.
Beware: The au-
thors are expecting
us to figure out
the idea of the
repertoire method
from
seat-of-the-
pants examples,
instead of giving
us a top-down
presentation. The
method works best
with recurrences
that are ‘linear” in
the sense that /heir
solutions can be
expressed as a sum
of arbitrary param-
eters multiplied by
functions of n, as
in (1.13). Equation
(1.13) is the key.
(‘relax = ‘destroy’)
I think I get it:
The binary repre-
sentations of A(n),
B(n), and C(n)
have 1
‘s
in different
positions.
1.3 THE JOSEPHUS PROBLEM 15
Our conjectures in (1.14) follow immediately, since we can solve these equa-
tions to get C(n) = n
-
A(n) =
1
and B(n) = A(n)
-
1
-
C(n) = 2”
-
1
-
1.
This approach illustrates a surprisingly useful repertoire method for solv-
ing recurrences. First we find settings of general parameters for which we
know the solution; this gives us a repertoire of special cases that we can solve.
Then we obtain the general case by combining the special cases. We need as
many independent special solutions as there are independent parameters (in
this case three, for
01,
J3,
and y). Exercises 16 and 20 provide further examples
of the repertoire approach.
We know that the original J-recurrence has a magical solution, in binary:
J(bn
bm-1 . . . bl bob) = (bm-1 . . . b,
bo
b,)z ,
where
b,
=
1.
Does the generalized
Josephus
recurrence admit of such magic?
Sure, why not? We can rewrite the generalized recurrence (1.11) as
f(1) = a;
f(2n
+ j) = 2f(n) +
J3j
,
for j =
0,l
and n 3
1,
(1.15)
if we let
BO
=
J3
and
J31
=
y.
And this recurrence unfolds, binary-wise:
f(bnbm-1
. . .
bl
bob)
=
2f((bm
b-1
. . .
b,
12) + fib0
=
4f((b,
b,el
. .
.
Wz)
+
2f’b,
+ fib‘,
=
2mf((bmh)
+2m-1Pbmm,
+.“+@b,
+
(3bo
= 2”(x +
293b,m,
+ “’ +
2(&q
+
&,
.
Suppose we now relax the radix 2 notation to allow arbitrary digits instead
of just 0 and
1.
The derivation above tells us that
f((bm
b-1
. .
bl
bob)
=
(01
fib,-,
Pb,,mz . . . @b, f’bo
12
.
(1.16)
Nice. We would have seen this pattern earlier if we had written (1.12) in
anot her way:
16 RECURRENT PROBLEMS
For example, when n = 100 = (1100100)~, our original
Josephus
values
LX=],
/3=-l,andy=l
yield
n= (1 1 0 0 1 0 O)L
=
100
f(n) = ( 1 1 -1 -1 1 -1
-1)1
=+64+32-16-8+4-2-l
=
73
as before. The cyclic-shift property follows because each block of binary digits
(10 . . .
00)~ in the representation of n is transformed into
(l-l . . .
-l-l)2
= (00 ..,Ol)z.
So our change of notation has given us the compact solution (1.16) to the
There are two
general recurrence (1.15). If we’re really uninhibited we can now generalize
kinds
Ofgenera’-
even more. The recurrence
izations.
One is
cheap and the other
f(i)
=
aj
,
for
1
<
j
<
d;
is valuable.
(1.17)
It is easy to gen-
f(dn + j) = cf(n) +
(3j
,
forO<j<d
and
n31,
eralize by diluting
a little idea with a
is the same as the previous one except that we start with numbers in radix d
big terminology.
and produce values in radix c. That is, it has the radix-changing solution
It is much more
dificult to pre-
pare a refined and
f(
bn
b-1
. .
.bl
b&i)
=
cab,
f’b,m,
fib,->
. . . bb,
(3bo)c.
(1.18)
condensed extract
from
several good
For example, suppose that by some stroke of luck we’re given the recurrence
ingredients.
-
G.
Pdlya
12381
f(1) = 34,
f(2) = 5,
f(3n) = lOf(n) + 76,
for n 3
1,
f(3nfl)
= lOf(n)-2, for n 3
1,
f(3n
+2)
=
lOf(n)+8,
for n 3 1,
and suppose we want to compute f (19). Here we have d = 3 and c = 10. Now
Perhaps this was a
19 = (201)3, and the radix-changing solution tells us to perform a digit-by-
stroke
Of
bad
luck.
digit replacement from radix 3 to radix 10. So the leading 2 becomes a 5, and
the 0 and 1 become 76 and -2, giving
f(19) =
f((201)3)
= (5 76 -2),. = 1258,
which is our answer.
But in general I’m
Thus
Josephus
and the Jewish-Roman war have led us to some interesting
against recurrences
general recurrences.
of
war.
1 EXERCISES 17
Exercises
Warmups
Please do all the
1
All horses are the same color; we can prove this by induction on the
warmups
in
all the
chapters!
number of horses in a given set. Here’s how: “If there’s just one horse
-
The
h4gm
‘t
then it’s the same color as itself, so the basis is trivial. For the induction
step, assume that there are n horses numbered 1 to n. By the induc-
tion hypothesis, horses 1 through n
-
1 are the same color, and similarly
horses 2 through n are the same color. But the middle horses, 2 through
n
-
1, can’t change color when they’re in different groups; these are
horses, not chameleons. So horses 1 and n must be the same color as
well, by transitivity. Thus all n horses are the same color; QED.” What,
if anything, is wrong with this reasoning?
2 Find the shortest sequence of moves that transfers a tower of n disks
from the left peg A to the right peg B, if direct moves between A and B
are disallowed. (Each move must be to or from the middle peg. As usual,
a larger disk must never appear above a smaller one.)
3
Show that, in the process of transferring a tower under the restrictions of
the preceding exercise, we will actually encounter every properly stacked
arrangement of n disks on three pegs.
4 Are there any starting and ending configurations of n disks on three pegs
that are more than 2”
-
1 moves apart, under Lucas’s original rules?
5 A “Venn diagram” with three overlapping circles is often used to illustrate
the eight possible subsets associated with three given sets:
Can the sixteen possibilities that arise with four given sets be illustrated
by four overlapping circles?
6 Some of the regions defined by n lines in the plane are infinite, while
others are bounded. What’s the maximum possible number of bounded
regions?
7 Let H(n) = J(n+ 1)
-
J(n). Equation (1.8) tells us that H(2n) = 2, and
H(2n+l)
=
J(2n+2)-J(2n+l)
=
(2J(n+l)-l)-(2J(n)+l)
=
2H(n)-2,
for all n 3 1. Therefore it seems possible to prove that H(n) = 2 for all n,
by induction on n. What’s wrong here?
18 RECURRENT PROBLEMS
Homework exercises
8 Solve the recurrence
Qo = 0~;
QI
=
B;
Qn =
(1
+
Qn-l)/Qn-2,
for n > 1.
Assume that Q,, # 0 for all n 3 0. Hint: QJ = (1 +
oc)/(3.
9 Sometimes it’s possible to use induction backwards, proving things from
now that’s a
n to n
-
1 instead of vice versa! For example, consider the statement
horse of a different
color.
P(n) :
x1
. .
.x,
6
(
x1
+.
. . +
x,
n
n
)
,
ifxr
,...,
x,30.
This is true when n = 2, since
(x1
+xJ)~
-4~1x2
=
(x1
-xz)~
3 0.
a
By setting x,, =
(XI
+
...
+
x,~l)/(n
-
l),
prove that P(n) im-
plies P(n
-
1) whenever n >
1.
b Show that P(n) and P(2) imply P(2n).
C
Explain why this implies the truth of P(n) for all
n.
10
Let Q,, be the minimum number of moves needed to transfer a tower of
n disks from A to B if all moves must be clockwise-that is, from A
to B, or from B to the other peg, or from the other peg to A. Also let R,
be the minimum number of moves needed to go from B back to A under
this restriction. Prove that
Qn=
;;,,,+l
{
,
;;;,;i
Rn=
0
,
i
d
+Qnp,+,,
;;;,;’
n
(You need not solve these recurrences; we’ll see how to do that in Chap-
ter 7.)
11 A Double Tower of Hanoi contains 2n disks of n different sizes, two of
each size. As usual, we’re required to move only one disk at a time,
without putting a larger one over a smaller one.
a
How many moves does it take to transfer a double tower from one
peg to another, if disks of equal size are indistinguishable from each
other?
b What if we are required to reproduce the original top-to-bottom
order of all the equal-size disks in the final arrangement? [Hint:
This is difficult-it’s really a “bonus problem.“]
12 Let’s generalize exercise lla even further, by assuming that there are
m different sizes of disks and exactly
nk
disks of size k. Determine
Nnl,.
. . ,
n,), the minimum number of moves needed to transfer a tower
when equal-size disks are considered to be indistinguishable.
1 EXERCISES 19
13
What’s the maximum number of regions definable by n zig-zag lines,
c
zzz=12
Good luck
keep-
ing the
cheese in
position.
each of which consists of two parallel infinite half-lines joined by a straight
segment?
14 How many pieces of cheese can you obtain from a single thick piece by
making five straight slices? (The cheese must stay in its original position
while you do all the cutting, and each slice must correspond to a plane
in 3D.) Find a recurrence relation for P,, the maximum number of three-
dimensional regions that can be defined by n different planes.
15 Josephus had a friend who was saved by getting into the next-to-last
position. What is I(n), the number of the penultimate survivor when
every second person is executed?
16 Use the repertoire method to solve the general four-parameter recurrence
g(l) =
m;
gVn+j)
=
h(n)
+w+
Pi,
for j = 0,l and n 3 1.
Hint: Try the function g(n) = n.
Exam problems
17
If W, is the minimum number of moves needed to transfer a tower of n
disks from one peg to another when there are four pegs instead of three,
show that
Wn(n+1
j/2 6
34’n(n-1
i/2 +
Tn
7
for n > 0.
Is
this like a
five-star general
recurrence?
(Here
T,,
= 2”
-
1 is the ordinary three-peg number.) Use this to find a
closed form f(n) such that W,(,+r~,~ 6 f(n) for all n 3 0.
18 Show that the following set of n bent lines defines
Z,
regions, where
Z,
is defined in (1.7): The jth bent line, for 1 < j 6 n, has its zig at (nZi,O)
and goes up through the points (n’j
-
nj, 1) and (n’j
-
ni
-
nn,
1).
19 Is it possible to obtain
Z,
regions with n bent lines when the angle at
each zig is 30”?
20 Use the repertoire method to solve the general five-parameter recurrence
h(l) = a;
h(2n + i) = 4h(n) +
yin
+
(3j
,
forj=O,l
and
n>l.
Hint: Try the functions h(n) = n and h(n) = n2.
20 RECURRENT PROBLEMS
21 Suppose there are 2n people in a circle; the first n are “good guys”
and the last n are “bad guys!’ Show that there is always an integer m
(depending on n) such that, if we go around the circle executing every
mth person, all the bad guys are first to go. (For example, when n = 3
we can take m = 5; when n = 4 we can take m = 30.)
Bonus problems
22 Show that it’s possible to construct a Venn diagram for all 2” possible
subsets of n given sets, using n convex polygons that are congruent to
each other and rotated about a common center.
23 Suppose that
Josephus
finds himself in a given position j, but he has a
chance to name the elimination parameter q such that every qth person
is executed. Can he always save himself?
Research problems
24 Find all recurrence relations of the form
x _
ao+alX,-1
+...+akXnPk
n-
bl
X,-i + . . +
bkXn-k
whose solution is periodic.
25
Solve infinitely many cases of the four-peg Tower of Hanoi problem by
proving that equality holds in the relation of exercise 17.
26 Generalizing exercise 23, let’s say that a
Josephus
subset of {1,2,. . . , n}
is a set of k numbers such that, for some
q,
the people with the other n-k
numbers will be eliminated first. (These are the k positions of the “good
guys”
Josephus
wants to save.) It turns out that when n = 9, three of the
29
possible subsets are non-Josephus, namely
{1,2,5,8,9},
{2,3,4,5,
S},
and
{2,5,6,7,
S}.
There are 13 non-Josephus sets when n = 12, none for
any other values of n 6 12. Are non-Josephus subsets rare for large n?
Yes, and well done
if you find them.
2
Sums
SUMS ARE EVERYWHERE in mathematics, so we need basic tools to handle
them. This chapter develops the notation and general techniques that make
summation user-friendly.
2.1 NOTATION
In Chapter 1 we encountered the sum of the first n integers, which
wewroteoutas1+2+3+...+(n-1)fn.
The‘...‘insuchformulastells
us to complete the pattern established by the surrounding terms. Of course
we have to watch out for sums like 1
+
7 + . . .
+
41.7, which are meaningless
without a mitigating context. On the other hand, the inclusion of terms like
3 and (n
-
1) was a bit of overkill; the pattern would presumably have been
clear if we had written simply 1 + 2 + . . . + n. Sometimes we might even be
so bold as to write just 1
f..
. + n.
We’ll be working with sums of the general form
al
+
a2 +
...
+
a,,
(2.1)
where each
ok
is a number that has been defined somehow. This notation has
the advantage that we can “see” the whole sum, almost as if it were written
out in full, if we have a good enough imagination.
A term is how long
Each element
ok
of a sum is called a term. The terms are often specified
this course lasts.
implicitly as formulas that follow a readily perceived pattern, and in such cases
we must sometimes write them in an expanded form so that the meaning is
clear. For example, if
1
+2+
. . .
+2+'
is supposed to denote a sum of n terms, not of 2”-‘, we should write it more
explicitly as
2O
+
2'
+.
. .
+
2n-'.
21
22 SUMS
The three-dots notation has many uses, but it can be ambiguous and a
“Le
signe
,T~~~
bit long-winded. Other alternatives are available, notably the delimited form
indique
Ve
/‘on
doit
dormer
k=l
au nombre entier i
(2.2)
to&es ses valeurs
1,2,3
,...,
et
prendre la somme
which is called Sigma-notation because it uses the Greek letter
t
(upper-
des termes.”
case sigma). This notation tells us to include in the sum precisely those
-
J. Fourier
I1021
terms
ok
whose index k is an integer that lies between the lower and upper
limits 1 and n, inclusive. In words, we
“sum
over k, from 1 to n.” Joseph
Fourier introduced this delimited t-notation in 1820, and it soon took the
mathematical world by storm.
Incidentally, the quantity after
x
(here ok) is called the
summa&.
The index variable k is said to be bound to the
x
sign in
(2.2),
because
the k in ok is unrelated to appearances of k outside the Sigma-notation. Any
other letter could be substituted for k here without changing the meaning of
Well, I wouldn’t
(2.2). The letter i is often used (perhaps because it stands for “index”), but
want
to
use
a
Or
n
we’ll generally sum on k since it’s wise to keep i for
&i.
as the index vari-
able instead of k in
It turns out that a generalized Sigma-notation is even more useful than
(2.2);
those
letters
the delimited form: We simply write one or more conditions under the
x.,
are “free variables”
to specify the set of indices over which summation should take place. For
that
do
have
mean-
example, the sums in (2.1) and (2.2) can also be written as
mg
outside the
2
here.
ix
ak .
(2.3)
l<k<n
In this particular example there isn’t much difference between the new form
and
(2.2),
but the general form allows us to take sums over index sets that
aren’t restricted to consecutive integers. Fbr example, we can express the sum
of the squares of all odd positive integers below 100 as follows:
l<k<lOO
k odd
The delimited equivalent of this sum,
2k + 1)’ ,
k=O
is more cumbersome and less clear. Similarly, the sum of reciprocals of all
prime numbers between 1 and N is
x
;;
P<N
p prime
2.1 NOTATION 23
the delimited form would require us to write
The summation
where
pk
denotes the kth prime and n(N) is the number of primes < N.
(Incidentally, this sum gives the approximate average number of distinct prime
factors of a random integer near N, since about 1 /p of those integers are
divisible by p. Its value for large N is approximately lnln N + 0.261972128;
In
x
stands for the natural logarithm of x, and In In x stands for ln( In x)
.)
The biggest advantage of general Sigma-notation is that we can manip-
ulate it more easily than the delimited form. For example, suppose we want
symbol looks like
a distorted pacman.
to change the index variable k to k + 1. With the general form, we have
ak
=
ak+l
;
l<k<n l<k+l<n
it’s easy to see what’s going on, and we can do the substitution almost without
thinking. But with the delimited form, we have
n--l
$
tak+1;
ak
=
k=l
k=O
A tidy sum.
it’s harder to see what’s happened, and we’re more likely to make a mistake.
On the other hand, the delimited form isn’t completely useless. It’s
nice and tidy, and we can write it quickly because (2.2) has seven symbols
compared with (2.3)‘s eight. Therefore we’ll often use
1
with upper and
lower delimiters when we state a problem or present a result, but we’ll prefer
to work with relations-under-x when we’re manipulating a sum whose index
variables need to be transformed.
That’s nothing.
The
t
sign occurs more than 1000 times in this book, so we should be
You should see how
many times
C
ap-
sure that we know exactly what it means. Formally, we write
pears in The Iliad.
h
(2.4)
Pikl
as an abbreviation for the sum of all terms
ok
such that k is an integer
satisfying a given property P(k). (A “property P(k)” is any statement about
k that can be either true or false.) For the time being, we’ll assume that
only finitely many integers k satisfying P(k) have
ok
# 0; otherwise infinitely
many nonzero numbers are being added together, and things can get a bit
tricky. At the other extreme, if P(k) is false for all integers k, we have an
“empty” sum; the value of an empty sum is defined to be zero.
24 SUMS
A slightly modified form of (2.4) is used when a sum appears within the
text of a paragraph rather than in a displayed equation: We write
‘x.pCkl
ak’,
attaching property P(k) as a subscript of
1,
so that the formula won’t stick
out too much. Similarly, ‘xF=,
ak’
is a convenient alternative to (2.2) when
we want to confine the notation to a single line.
People are often tempted to write
n-1
z
k(k-
l)(n-
k) instead of
f
k(k- l)(n- k)
k=2 k=O
because the terms for k = 0, 1, and n in this sum are zero. Somehow it
seems more efficient to add up n
-
2 terms instead of n + 1 terms. But such
temptations should be resisted; efficiency of computation is not the same as
efficiency of understanding! We will find it advantageous to keep upper and
lower bounds on an index of summation as simple as possible, because sums
can be manipulated much more easily when the bounds are simple. Indeed,
the form
EL!;
can even be dangerously ambiguous, because its meaning is
not at all clear when n = 0 or n = 1 (see exercise 1). Zero-valued terms cause
no harm, and they often save a lot of trouble.
So far the notations we’ve been discussing are quite standard, but now
we are about to make a radical departure from tradition. Kenneth Iverson
introduced a wonderful idea in his programming language APL
[161,
page
111,
and we’ll see that it greatly simplifies many of the things we want to do in
this book. The idea is simply to enclose a true-or-false statement in brackets,
and to sav that the result is 1 if the statement is true. 0 if the statement is
Hev: The
“Kro-
I
false. For example,
neiker delta”
that
I’ve seen in other
1,
if p is a prime number;
books
(I
mean
[p
prime] =
0,
if p is not a prime number.
6k,,
, which is 1 if
k=n,
Ooth-
erwise) is just a
Iverson’s convention allows us to express sums with no constraints whatever
special case
of
on the index of summation, because we can rewrite (2.4) in the form
lverson
‘s
conven-
tion: We can write
x
ak
[P(k)]
.
k
(2.5)
[ k = n
]
instead.
If P(k) is false, the term ok[P(k)]
is zero, so we can safely include it among
the terms being summed. This makes it easy to manipulate the index of
summation, because we don’t have to fuss with boundary conditions.
A slight technicality needs to be mentioned: Sometimes ok isn’t defined
for all integers k. We get around this difficulty by assuming that [P(k)] is
“very strongly zero” when P(k) is false; it’s so much zero, it makes ok [P(k)]
equal to zero even when ok is undefined. For example, if we use Iverson’s
2.1 NOTATION 25
convention to write the sum of reciprocal primes $ N as
x
[p
prime1
[P
<
N
1
/P
,
P
there’s no problem of division by zero when p = 0, because our convention
tells us that
[O
prime]
[O
< Nl/O = 0.
Let’s sum up what we’ve discussed so far about sums. There are two
good ways to express a sum of terms: One way uses ‘. . .‘, the other uses
t
‘.
The three-dots form often suggests useful manipulations, particularly
the combination of adjacent terms, since we might be able to spot a simplifying
pattern if we let the whole sum hang out before our eyes. But too much detail
can also be overwhelming. Sigma-notation is compact, impressive to family
. .
and it’s less
and friends, and often suggestive of manipulations that are not obvious in
likely to lose points
on an exam for
three-dots form. When we work with Sigma-notation, zero terms are not
“lack of rigor.”
generally harmful; in fact, zeros often make t-manipulation easier.
2.2 SUMS AND RECURRENCES
OK, we understand now how to express sums with fancy notation.
But how does a person actually go about finding the value of a sum? One way
is to observe that there’s an intimate relation between sums and recurrences.
The sum
(Think of S, as is equivalent to the recurrence
not just a single
number, but as a
sequence defined for
SO
= ao;
all n 3 0
.)
S,
=
S-1
+ a,,
for n > 0.
(2.6)
Therefore we can evaluate sums in closed form by using the methods we
learned in Chapter 1 to solve recurrences in closed form.
For example, if a,, is equal to a constant plus a multiple of n, the sum-
recurrence (2.6) takes the following general form:
Ro=cx;
R,=R,-l+B+yn,
for n > 0.
Proceeding as in Chapter 1, we find
RI
= a +
fi
+
y,
Rz
=
OL
+ 26 + 37, and
so on; in general the solution can be written in the form
R, = A(n)
OL
+ B(n)
S
+
C(n)y
,
(2.8)
26 SUMS
where A(n), B(n), and C(n) are the coefficients of dependence on the general
parameters
01,
B,
and
y.
The repertoire method tells us to try plugging in simple functions of n
for
R,,
hoping to find constant parameters
01,
(3,
and y where the solution is
especially simple. Setting R, = 1 implies
LX
= 1,
(3
= 0, y = 0; hence
A(n) = 1.
Setting R, = n implies a = 0,
(3
=
1,
y = 0; hence
B(n) = n.
Setting R, =
n2
implies a = 0,
(3
= -1, y = 2; hence
2C(n) -B(n) =
n2
and we have C(n) =
(n2
+n)/2.
Easy as pie.
Therefore if we wish to evaluate
n
E(
a + bk) ,
k=O
the sum-recurrence (2.6) boils down to (2.7) with a =
(3
= a, y = b, and the
answer is
aA
+ aB(n) + bC(n) = a(n + 1) + b(n +
l)n/2.
Conversely, many recurrences can be reduced to sums; therefore the spe-
cial methods for evaluating sums that we’ll be learning later in this chapter
will help us solve recurrences that might otherwise be difficult. The Tower of
Hanoi recurrence is a case in point:
To = 0;
T,,
= 2T,_,
+l
,
for n > 0.
It can be put into the special form (2.6) if we divide both sides by 2”:
To/2' = 0;
TJ2"
= T,-,/2-' +l/2n, for n > 0.
Now we can set
S,
= T,/2n, and we have
so = 0;
s,
= s,~-’
+2-n)
for n > 0.
Actually easier;
n
=
x
8
nx
14n+1)14n+3) .
It follows that
s,
=
t2-k
k=l
2.2 SUMS AND RECURRENCES 27
(Notice that we’ve left the term for k = 0 out of this sum.) The sum of the
geometricseries2~‘+2~2+~~~+2~“=(~)’+(~)2+~~~+(~)nwillbederived
later in this chapter; it turns out to be 1
-
(i
)“. Hence
T,,
= 2”S, = 2”
-
1.
We have converted T, to
S,
in this derivation by noticing that the re-
currence could be divided by
2n.
This trick is a special case of a general
technique that can reduce virtually any recurrence of the form
a,T,, =
bnTn-1
+
cn
(2.9)
to a sum. The idea is to multiply both sides by a summation factor, s,:
s,a,T,,
=
s,,bnTn-1 + snc,, .
This factor s, is cleverly chosen to make
sb
n n
=
h-1
an-l
s
Then if we write
S,
=
s,a,T,,
we have a sum-recurrence,
Sn
=
Sn-1
+SnCn.
Hence
%I
= socuT +
t
skck
=
s.lblTo
+
c
skck
,
k=l k=l
and the solution to the original recurrence (2.9) is
1
n
T,
=
-
ha,
s,b,To +
&Ck
k=l
(2.10)
[The
value
of
s1
cancels out, so it
can
be
anything
but zero.)
For example, when n = 1 we get
T,
=
(s~b,To
+slcl)/slal
=
(b,To
+cl)/al.
But how can we be clever enough to find the right s,? No problem: The
relation
s,,
=
snPl
anPI
/b,
can be unfolded to tell us that the fraction
a,-
1
a,-2..
. al
S
n
=
b,bnp,...bz
(2.11)
or any convenient constant multiple of this value, will be a suitable summation
factor. For example, the Tower of Hanoi recurrence has a,, = 1 and b, = 2;
the general method we’ve just derived says that
sn
=
2-”
is a good thing to
multiply by, if we want to reduce the recurrence to a sum. We don’t need a
brilliant flash of inspiration to discover this multiplier.
We must be careful, as always, not to divide by zero. The summation-
factor method works whenever all the a’s and all the b’s are nonzero.
28 SUMS
Let’s apply these ideas to a recurrence that arises in the study of “quick-
sort,” one of the most important methods for sorting data inside a computer.
(Quicksort was
The average number of comparison steps made by quicksort when it is applied
invented
bY
H0arc
to n items in random order satisfies the recurrence
in 1962 [158].)
k=O
for n > 0.
(2.12)
Hmmm. This looks much scarier than the recurrences we’ve seen before; it
includes a sum over all previous values, and a division by n. Trying small
cases gives us some data (Cl = 2, Cl = 5,
CX
= T) but doesn’t do anything
to quell our fears.
We can, however, reduce the complexity of
(2.12)
systematically, by first
getting rid of the division and then getting rid of the
1
sign. The idea is to
multiply both sides by n, obtaining the relation
n-1
nC,
=
n2+n+2xCk,
for n > 0;
k=O
hence, if we replace n by n
-
1,
n-2
(n-l)cnpj
=
(n-1)2+(n-1)+2xck,
forn-1
>O.
k=O
We can now subtract the second equation from the first, and the
1
sign
disappears:
nC,
-
(n
-
1)&l
= 2n + 2C,-1 ,
for n >
1.
It turns out that this relation also holds when n =
1,
because Cl = 2. There-
fore the original recurrence for
C,
reduces to a much simpler one:
co = 0;
nC,
=
(n + 1 )C,-I + 2n,
for n > 0.
Progress. We’re now in a position to apply a summation factor, since this
recurrence has the form of (2.9) with a, = n,
b,
= n + 1, and c, = 2n.
The general method described on the preceding page tells us to multiply the
recurrence through by some multiple of
a,._1
an-l.
. .
a1
(n-l).(n-2).....1
2
S
n
=
b,b,-,
. . b2 =
(n+l).n...:3
=
(n+l)n
2.2 SUMS AND RECURRENCES 29
We started with a
t
in the recur-
rence, and worked
hard to get rid of
it.
But then after ap-
plying
a summation
factor, we
came up
with another
t.
Are sums good, or
bad, or what?
But your spelling is
a/wrong.
The solution, according to
(2.10),
is therefore
C,
= 2(n + 1)
f
1.
k=l
k+l
The sum that remains is very similar to a quantity that arises frequently
in applications. It arises so often, in fact, that we give it a special name and
a special notation:
H,
=
,+;+...+;
r
f;.
k=l
(2.13)
The letter H stands for “harmonic”;
H,
is a harmonic number, so called
because the kth harmonic produced by a violin string is the fundamental
tone produced by a string that is l/k times as long.
We can complete our study of the quicksort recurrence
(2.12)
by putting
C,
into closed form; this will be possible if we can express
C,
in terms of H,.
The sum in our formula for
C,
is
We can relate this to
H,
without much difficulty by changing k to k
-
1 and
revising the boundary conditions:
=
(
>--
t
1 1
1
i
1+nSi=
H,-5.
l<k<n
nfl
Alright! We have found the sum needed to complete the solution to
(2.12):
The average number of comparisons made by quicksort when it is applied to
n randomly ordered items of data is
C,
=
2(n+l)H,-2n.
(2.14)
As usual, we check that small cases are correct:
Cc
= 0, Cl = 2,
C2
= 5.
30 SUMS
2.3 MANIPULATION OF SUMS
Not to be confused
with finance.
The key to success with sums is an ability to change one
t
into
another that is simpler or closer to some goal. And it’s easy to do this by
learning a few basic rules of transformation and by practicing their use.
Let K be any finite set of integers. Sums over the elements of K can be
transformed by using three simple rules:
x
pk;
cak
=
c
(distributive law)
(2.15)
kEK
kEK
~iak+bk)
=
&+~bk;
(associative law) (2.16)
kEK
kEK
UK
x
ak
=
x
%(k)
*
(commutative law)
(2.17)
kEK
p(k)EK
The distributive law allows us to move constants in and out of a
t.
The
associative law allows us to break a
x
into two parts, or to combine two
x’s
into one. The commutative law says that we can reorder the terms in any way
we please; here p(k) is any permutation of the set of all integers. For example,
Why not call it
if K = (-1 (0, +l} and if p(k) =
-k,
these three laws tell us respectively that
permutative instead
of commutative?
ca-1 +
cao
+ cal =
c(a-j
faofal);
(distributive law)
(a-1
Sb-1)
+
(ao+b)
+ (al
+bl)
=
(a-l+ao+al)+(b-l+bo+bl);
(associative law)
a-1 + a0 + al = al + a0 + a-1 .
(commutative law)
Gauss’s trick in Chapter 1 can be viewed
as
an application of these three
basic laws. Suppose we want to compute the general sum of an arithmetic
progression,
S =
x
(afbk).
O<k$n
By the commutative law we can replace k by n
-
k, obtaining
S =
x
(a+b(n-k)) =
x
(a+bn-bk).
O<n-k<n
O<k<n
These two equations can be added by using the associative law:
This is something
like changing vari-
ables inside an
integral, but easier.
2S =
x
((a+bk)+(a+bn-bk)) =
x
(2afbn).
O<k<n
O<k$n
“What’s one
and one and one
and one and one
and one and one
and one and one
and one?”
“1
don’t know,”
said Alice.
‘7
lost count.”
“She can’t do
Addition.”
-Lewis Carroll
[44]
Additional, eh?
2.3 MANIPULATION OF SUMS 31
And we can now apply the distributive law and evaluate a trivial sum:
2S =
(2a+bn)
t
1 =
(2a+bn)(n+l).
O<k<n
Dividing by 2, we have proved that
L(
a+bk) =
(a+ibn)(n+l).
(2.18)
k=O
The right-hand side can be remembered as the average of the first and last
terms, namely
i
(a + (a + bn)), times the number of terms, namely (n + 1).
It’s important to bear in mind that the function p(k) in the general
commutative law (2.17) is supposed to be a permutation of all the integers. In
other words, for every integer n there should be exactly one integer k such that
p(k) = n. Otherwise the commutative law might fail; exercise 3 illustrates
this with a vengeance. Transformations like p(k) = k + c or p(k) = c
-
k,
where c is an integer constant, are always permutations, so they always work.
On the other hand, we can relax the permutation restriction a little bit:
We need to require only that there be exactly one integer k with p(k) = n
when n is an element of the index set K. If n
6
K (that is, if n is not in K),
it doesn’t matter how often p(k) = n occurs, because such k don’t take part
in the sum. Thus, for example, we can argue that
t
ak
=
x
an
=
t
a2k
=
x
a2k,
(2.19)
kEK
WSK
2kEK 2kEK
k even
n even
2k even
since there’s exactly one k such that 2k = n when n
E
K and n is even.
Iverson’s convention, which allows us to obtain the values 0 or 1 from
logical statements in the middle of a formula, can be used together with the
distributive, associative, and commutative laws to deduce additional proper-
ties of sums. For example, here is an important rule for combining different
sets of indices: If K and K’ are any sets of integers, then
x
ak
+
x
ak
=
x
ak
+
t
ak.
kE:K
kEK’ kEKnK’ kEKuK’
This follows from the general formulas
(2.20)
t
ak
=
t
ak[kEK]
(2.21)
kEK
k
and
[kEK]+[kEK’]
=
[kEKnK’]+[kEKuK’].
(2.22)
32 SUMS
Typically we use rule (2.20) either to combine two almost-disjoint index sets,
as in
m
n n
tak
+
t
ak
=
am
+
x
ak,
for 1 < m < n;
k=l
k=m
k=l
or to split off a single term from a sum, as in
ak
=
a0
+
ak ,
for n 3 0.
O<k<n
I<k<n
This operation of splitting off a term is the basis of a perturbation
method that often allows us to evaluate a sum in closed form. The idea
is to start with an unknown sum and call it S,:
sn
=
x
ak.
O<k<n
(Name and conquer.) Then we rewrite
Sn+l
in two ways, by splitting off both
its last term and its first term:
S,+
an+1
=
ak
=
a0
+
ak
O<k<n+l
1
ik$n+l
=
a0+
lx
ak+l
l<k+lSn+l
=
a0
+
x
ak+l
.
(2.24)
O<k<n
Now we can work on this last sum and try to express it in terms of S,. If we
succeed, we obtain an equation whose solution is the sum we seek.
For example, let’s use this approach to find the sum of a general geomet-
If it’s geometric,
ric progression,
there should be a
geometric proof.
S, =
x
axk.
04kSn
The general perturbation scheme in (2.24) tells us that
S, +
axn+’
=
ax0
+
z
axk+’
,
O<k<n
and the sum on the right is xxobkGn
axk
= xS, by the distributive law.
Therefore S, +
ax”+’
= a +
xSnr
and we can solve for S, to obtain
Laxk
=
aycJxi+‘,
forx#l
k=O
(2.25 )
2.3 MANIPULATION OF SUMS 33
(When x = 1, the sum is of course simply (n + 1 )a.) The right-hand side
Ah yes, this formula
can be remembered as the first term included in the sum minus the first term
was drilled into me
in high school.
excluded (the term after the last), divided by 1 minus the term ratio.
That was almost too easy. Let’s try the perturbation technique on a
slightly more difficult sum,
S,
=
x
k2k
O<k<n
In this case we have
So
= 0,
S1
= 2, Sl = 10,
Ss
= 34,
S4
= 98; what is the
general formula? According to (2.24) we have
S,+(n+1)2”+’
=
x
(k+1)2k+‘;
O<k<n
so we want to express the right-hand sum in terms of S,. Well, we can break
it into two sums with the help of the associative law,
x
k2k+’
+
x
2k+‘,
O$k<n O<k<n
and the first of the remaining sums is
2S,.
The other sum is a geometric
progression, which equals (2
-
2”+2)/( 1
-
2) =
2n+2
-
2 by (2.25). Therefore
we have
S,
+ (n + 1
)2n+’
= 2S, +
2n+2
-
2, and algebra yields
ix
k2k = (n-
1)2"+'
+2.
O<k<n
Now we understand why
Ss
= 34: It’s 32 + 2, not 2.17.
A similar derivation with x in place of 2 would have given us the equation
S,+(n+
1)x"+'
=x&+(x-xXn+'
)/(l
-
x); hence we can deduce that
kxk =
x-(nt
l)xn+'
+nxn+2
(1
-x)2
'
for x # 1
k=O
(2.26)
It’s interesting to note that we could have derived this closed form in a
completely different way, by using elementary techniques of differential cal-
culus. If we start with the equation
n
x
1
-.
Xn+l
Xk
ZI
~
l-x
k=O
and take the derivative of both sides with respect to x, we get
f
k&’
=
(1-x)(-(n+l)xn)+l-xn+'
=
1 -(n+
l)xn
+nxn+’
k=O
(1
-x)2
(1
-x)2
'
2.4 MULTIPLE SUMS 35
The middle term of this law is a sum over two indices. On the left,
tj
tk
stands for summing first on k, then on j. On the right,
tk
xi
stands for
summing first on j, then on k. In practice when we want to evaluate a double
sum in closed form, it’s usually easier to sum it first on one index rather than
on the other; we get to choose whichever is more convenient.
Who’s panicking?
Sums of sums are no reason to panic, but they can appear confusing to
I
think this rule
is fairly obvious
a beginner, so let’s do some more examples. The nine-term sum we began
compared to some
with provides a good illustration of the manipulation of double sums, because
of the stuff in
that sum can actually be simplified, and the simplification process is typical
Chapter
1.
of what we can do with
x
x’s:
x
Cljbk
=
xCljbk[l
<j,k63]
=
tCljbk[l
<j<3][1
<k<3]
l<j,k<3
$7
j,k
Cljbk[l
<j<3][1
Sk631
i
k
=
xaj[l
<j<3]tbk[l
<k631
j
k
=
xaj[l
<i631
i
I(
xbk[l
<k63]
k
>
The first line here denotes a sum of nine terms in no particular order. The
second line groups them in threes, (al
bl
+
al
bz
+ al
b3)
+ (albl +
a2b2
+
azb3) + (a3bl +
a3b2
+ a3b3). The third line uses the distributive law to
factor out the a’s, since
oj
and [l 6 j 6
31
do not depend on k; this gives
al(bl +
b2
+
b3)
+ az(br +
bz
+
b3)
+ a3(bl +
bz
+ b3). The fourth line is
the same as the third, but with a redundant pair of parentheses thrown in
SO that the fifth line won’t look so mysterious. The fifth line factors out the
(br
+
b2
+
b3)
that occurs for each value of j: (al +
a2
+ as)(b, +
b2
+ b3).
The last line is just another way to write the previous line. This method of
derivation can be used to prove a general distributive law,
valid for all sets of indices J and K.
The basic law (2.27) for interchanging the order of summation has many
variations, which arise when we want to restrict the ranges of the indices
2.4 MULTIPLE SUMS 37
Does rocky road
have fudge in it?
the sum of all elements on or above the main diagonal of this array. Because
ojok = okoj, the array is symmetrical about its main diagonal; therefore
Sy
will be approximately half the sum of
all
the elements (except for a fudge
factor that takes account of the main diagonal).
Such considerations motivate the following manipulations. We have
Sq =
x
CljClk =
t
ClkClj
=
t
ajak
=
Sn,
l<j<k<n
l$k<j<n l<k<j<n
because we can rename (j, k) as (k, j). Furthermore, since
[16j<k<nl+[l<k<j<n]
=
[l<j,k<n]+[l<j=k<n],
we have
The first sum is (xy=, oj) (xE=, ok) =
(&
ok)‘, by the general distribu-
tive law (2.28). The second sum is
Et=,
at. Therefore we have
(2.33)
an expression for the upper triangular sum in terms of simpler single sums.
Encouraged by such success, let’s look at another double sum:
S =
x
(ok-Clj)(bk-bj).
l<j<k<n
Again we have symmetry when j and k are interchanged:
S
=
x
(oj-ok)(bj-bk)
=
t
(ok-oj)(bk-bj).
l<k<j<n l<k<j$n
So we can add S to itself, making use of the identity
[l<j<k<n]+[l<k<j<n]
=
[l<j,k<nl-[l<j=kCnl
to conclude that
2s =
x
(aj
-
ak)(bj
-
bk)
-
t
(aj
-
ak)(bj
-bk)
*
l$j,k<n
1
$j=k$n
38 SUMS
The second sum here is zero; what about the first? It expands into four
separate sums, each of which is vanilla flavored:
ojbj
-
ojbk
-
akbj +
t
akbk
l~j,k<n
l$j,k~n
l<j,k<n l<j,k<n
=
2
x
okbk
-
2
t
ojbk
l<j,k$n
l<j,k<n
=
2T-L
x
Clkbk
--
l<k<n
In the last step both sums have been simplified according to the general
distributive law (2.28). If the manipulation of the first sum seems mysterious,
here it is again in slow motion:
2
x
akbk
=
2
x
x
akh
l<j,k<n
l$k$n
l<j<n
= 2
x
okbk
x
1
1
$k<n l<j<n
= 2
x
okbkn = 2n
t
okbk.
l<k<:n l<k<n
An index variable that doesn’t appear in the summand (here j) can simply
be eliminated if we multiply what’s left by the size of that variable’s index
set (here n).
Returning to where we left off, we can now divide everything by 2 and
rearrange things to obtain an interesting formula:
(&)@k)
=
n~akbk-,<&jor-ai)(bibrl.
c2.34)
. ,
This identity yields Chebyshev’s summation inequalities as a special case:
(gok)(gbk)
6
n&lkbk.
ifo,
<...<o,andbl
6”‘Gbn;
(zok)($bk)
3
ngakbr,
ifal
6...<oa,andbl
3...abn.
(Chebyshev actu-
ally proved the
analogous result
for integrals
instead of sums:
!.I-:
f(x)
dx)
(J-1:
g(x)
dx)
S
(b
-
a)
. (.I-:f(xMx)
dx),
if f(x) and g(x)
are monotone
nondecreasing
functions.)
(In general, if al <
...
< a, and if p is a permutation of (1,. . . , n}, it’s
possible to prove that the largest value of
I;=,
akbPCk)
occurs when b,(l) 6
. . .
<
bp(n),
and the smallest value occurs when b,(l) 3 . . . 3
b,(,)
.)
2.4 MULTIPLE SUMS 39
Multiple summation has an interesting connection with the general op-
eration of changing the index of summation in single sums. We know by the
commutative law that
t
ak
=
a,(k)
1
&K
p(k)EK
if p(k) is any permutation of the integers. But what happens when we replace
k by f(j), where f is an arbitrary function
f: J
--+
K
that takes an integer j
E
J into an integer f(j)
E
K? The general formula for
index replacement is
x
Of(j)
=
x
ak#f-(k))
(2.35)
jCJ
kEK
where #f-(k) stands for the number of elements in the set
f-(k) = {j If(j) = k> y
that is, the number of values of j
E
J such that f(j) equals k.
It’s easy to prove (2.35) by interchanging the order of summation,
x
(h(j)
=
x
ak
[f(j)=k]
=
x
akt[f(j)=k]
,
jEJ
jEJ kEK jCJ
&K
since
xjEJ[f(j)
=k] = #f-(k). In the special case that f is a one-to-one
My other math
correspondence between J and K, we have
#f-(k)
= 1 for all k, and the
teacher calls this a
“bijection”; maybe
general formula (2.35) reduces to
171
learn to love
that word some day.
x
af(j)
=
t
af(j)
=
xak.
And then again. . .
jEJ f(jlEK
kEK
This is the commutative law (2.17) we had before, slightly disguised.
Our examples of multiple sums so far have all involved general terms like
ok
or bk. But this book is supposed to be concrete, so let’s take a look at a
multiple sum that involves actual numbers:
40 SUMS
The normal way to evaluate a double sum is to sum first on j or first
on k, so let’s explore both options.
s,=
x
EL
likGn
l$j<k
k-j
summing first on j
replacing j by k
-
j
=t
xf
l<k<n
O<j<kbl
simplifying the bounds on j
=
x
Hk-1
1
<k<n
=
x
Hk
l<k+l$n
by (2.13), the definition of
HkP1
replacing k by k + 1
=
x
Hk
. simplifying the bounds on k
O<k<n
Alas! We don’t know how to get a sum of harmonic numbers into closed form.
If we try summing first the other way, we get
Get out the whip.
summing first on k
=x
x;
l<j<n
j<k+jin
=z
x;
l<j<n
O<k<n-j
=
x
Hn-i
lgjsn
=
ix
Hj
1
<n-j<n
=x
Hj
.
O$j<n
replacing k by k + j
simplifying the bounds on k
by (2.13), the definition of Hn-j
replacing j by n
-
j
simplifying the bounds on j
We’re back at the same impasse.
But there’s another way to proceed, if we replace k by k + j before
deciding to reduce
S,
to a sum of sums:
s,=
x
-
l<j<k<n
“j
recopying the given sum
replacing k by k + j
2.4 MULTIPLE SUMS 41
summing first on j
the sum on j is trivial
by the associative law
l<kbn l<k<n
=n by gosh
=
nH,-n.
by (2.13), the definition of H,
It
was smart to say
Aha!
We’ve found S,. Combining this with the false starts we made gives us
k 6 n instead of
k < n
-
1 in this
a further identity as a bonus:
derivation. Simple
bounds save energy.
IL
Hk =
nH,-n
Obk<n
(2.36)
We can understand the trick that worked here in two ways, one algebraic
and one geometric. (1) Algebraically, if we have a double sum whose terms in-
volve k+f( j), where f is an arbitrary function, this example indicates that it’s
a good idea to try replacing k by k-f(j) and summing on j. (2) Geometrically,
we can look at this particular sum
S,
as follows, in the case n = 4:
k=l
k=2
k=3
k=4
j=l
f
+
;
+
;
j=2
$
+
;
j=3
1
i
j=4
Our first attempts, summing first on j (by columns) or on k (by rows), gave
US
HI
+
HZ
+
H3
=
H3
+ Hz + HI. The winning idea was essentially to sum
by diagonals, getting
f
+
5
+ 5.
2.5 GENERAL METHODS
Now let’s consolidate what we’ve learned, by looking at a single
example from several different angles. On the next few pages we’re going to
try to find a closed form for the sum of the first n squares, which we’ll call
0,:
0,
=
t
k2, for n > 0.
(2.37)
O<k<n
We’ll see that there are at least seven different ways to solve this problem,
and in the process we’ll learn useful strategies for attacking sums in general.
42 SUMS
First, as usual, we look at some small cases.
,:
0123456
0 1 4 9 16 25 36
49
7
64
8
81
9
100
10
121
11
144
12
q l
0 1 5 14 30 55 91
140 204 285 385 506 650
No closed form for
0,
is immediately evident; but when we do find one, we
can use these values as a check.
Method 0: You could look it up.
A problem like the sum of the first n squares has probably been solved
before, so we can most likely find the solution in a handy reference book.
Sure enough, page 72 of the CRC Standard Mathematical Tables
[24]
has the
answer:
q
_
n(n+1)(2n+l)
n-
6
'
for n 3 0.
(2.38)
Just to make sure we haven’t misread it, we check that this formula correctly
gives
0s
=
5.6.1
l/6
= 55. Incidentally, page 72 of the CRC Tables has
further information about the sums of cubes, . . . , tenth powers.
The definitive reference for mathematical formulas is the Handbook of
Mathematical Functions, edited by Abramowitz and Stegun
[2].
Pages 813-
(Harder sums
814 of that book list the values of
Cl,,
for n 6 100; and pages 804 and 809
can
be
found
exhibit formulas equivalent to (2.38), together with the analogous formulas
in Hansen’s
for sums of cubes, . . . , fifteenth powers, with or without alternating signs.
comprehensive
table (1471.)
But the best source for answers to questions about sequences is an amaz-
ing little book called the Handbook of Integer Sequences, by Sloane [270],
which lists thousands of sequences by their numerical values. If you come
up with a recurrence that you suspect has already been studied, all you have
to do is compute enough terms to distinguish your recurrence from other fa-
mous ones; then chances are you’ll find a pointer to the relevant literature in
Sloane’s Handbook. For example,
1,
5, 14, 30, . . . turns out to be Sloane’s
sequence number 1574, and it’s called the sequence of “square pyramidal
numbers” (because there are
El,
balls in a pyramid that has a square base of
n2
balls). Sloane gives three references, one of which is to the handbook of
Abramowitz and Stegun that we’ve already mentioned.
Still another way to probe the world’s store of accumulated mathematical
wisdom is to use a computer program (such as MACSYMA) that provides
tools for symbolic manipulation. Such programs are indispensable, especially
for people who need to deal with large formulas.
It’s good to be familiar with standard sources of information, because
they can be extremely helpful. But Method 0 isn’t really consistent with the
spirit of this book, because we want to know how to figure out the answers
Or, at least to
problems having
the same answers
_.
‘\
2.5 GENERAL METHODS 43
\
,’
\
by ourselves. 6he look-up method is limited to problems that other people
have decided are worth considering; a new problem won’t be there.
as problems that
other people have
Method 1: Guess the answer, prove it by induction.
decided to consider.
Perhaps a little bird has told us the answer to a problem, or we have
arrived at a closed form by some other less-than-rigorous means. Then we
merely have to prove that it is correct.
We might, for example, have noticed that the values of
0,
have rather
small prime factors, so we may have come up with formula (2.38) as something
that works for all small values of n. We might also have conjectured the
equivalent formula
0,
=
n(n+ t)(n+ 1)
3
for n > 0,
(2.39)
which is nicer because it’s easier to remember. The preponderance of the
evidence supports (2.3g), but we must prove our conjectures beyond all rea-
sonable doubt. Mathematical induction was invented for this purpose.
“Well, Your Honor, we know that 00 = 0 =
0(0+~)(0+1)/3,
so the basis
is easy. For the induction, suppose that n > 0, and assume that (2.39) holds
when n is replaced by n
-
1. Since
we have
3U, = (n- l)(n- t)(n) +
3n2
=
(n3
-
in2 + $n) +
3n2
=
(n3
+ in2 +
in)
= n(n+
t)(n+
1).
Therefore (2.39) indeed holds, beyond a reasonable doubt, for all n >
0.”
Judge Wapner, in his infinite wisdom, agrees.
Induction has its place, and it is somewhat more defensible than trying
to look up the answer. But it’s still not really what we’re seeking. All of
the other sums we have evaluated so far in this chapter have been conquered
without induction; we should likewise be able to determine a sum like
0,
from scratch. Flashes of inspiration should not be necessary. We should be
able to do sums even on our less creative days.
Method 2: Perturb the sum.
So let’s go back to the perturbation method that worked so well for the
geometric progression (2.25). We extract the first and last terms of
q
I,,+~
in
44 SUMS
order to get an equation for
0,:
q ,+(n+l)’ =
x
(k+l)’
=
x
(k2+2k+l)
O<k<n
O<k$n
=
t
k2+2
x
k+
x
1
O<k<n O<k<n O$k<n
ZZ
0,
+
2
x
k +
(n+l).
O<k<n
Oops- the
On’s
cancel each other. Occasionally, despite our best efforts, the
perturbation method produces something like
0,
= I&, so we lose.
Seems more
like a
On the other hand, this derivation is not a total loss; it does reveal a way
draw.
to sum the first n integers in closed form,
2
x
k =
(n+l)2-(n+l),
O<k<n
even though we’d hoped to discover the sum of first integers squared. Could
it be that if we start with the sum of the integers cubed, which we might
call
&,
we will get an expression for the integers squared? Let’s try it.
GD,+(n+1)3 =
t
(k+l)3
=
x
(k3+3k2+3k+l)
Obk<n O<k$n
=
CZJ,+3&+3y+(n+l).
Sure enough, the
L&‘S
cancel, and we have enough information to determine
Method 2’:
Cl, without relying on induction:
Perturb your TA.
30,
=
(n+l)3-3(n+l)n/2-(n+l)
=
(n+l)(n2+2n+l-3
n-l) =
(n+l)(n+t)n.
Method 3: Build a repertoire.
A slight generalization of the recurrence (2.7) will also suffice for
sum-
mands involving
n2.
The solution to
Ro
=
0~;
R, =
R,P1+(3+yn+6n2,
for n > 0,
(2.4”)
will be of the general form
R,
=
A(n)ol+B(n)fi + C(n)Y+D(u)d;
(2.41)
and we have already determined A(n), B(n), and C(n), because (2.41) is the
same as (2.7) when 6 = 0. If we now plug in R, =
n3,
we find that
n3
is the
2.5 GENERAL METHODS 45
The horizontal scale
here is ten times the
vertical scale.
solution when a =
0,
p
=
1,
y
=
-3,
6
=
3.
Hence
3D(n)
-
3C(n) + B(n) =
n3
;
this determines D(n).
We’re interested in the sum Cl,, which equals q -1 + n2; thus we get
17, =
R,
if we set a =
/3
= y = 0 and 6 = 1 in (2.41). Consequently
El, = D(n). We needn’t do the algebra to compute D(n) from B(n) and
C(n), since we already know what the answer will be; but doubters among us
should be reassured to find that
3D(n) =
n3+3C(n)-B(n)
=
n3+3T-n
=
n(n+t)(n+I),
Method 4: Replace sums by integrals.
People who have been raised on calculus instead of discrete mathematics
tend to be more familiar with
j
than with
1,
so they find it natural to try
changing
x
to
s.
One of our goals in this book is to become so comfortable
with
1
that we’ll think s is more difficult than
x
(at least for exact results).
But still, it’s a good idea to explore the relation between
x
and J, since
summation and integration are based on very similar ideas.
In calculus, an integral can be regarded as the area under a curve, and we
can approximate this area by adding up the areas of long, skinny rectangles
that touch the curve. We can also go the other way if a collection of long,
skinny rectangles is given: Since Cl, is the sum of the areas of rectangles
whose sizes are 1 x
1,
1 x 4, . . . ,
1 x n2, it is approximately equal to the area
under the curve f(x) = x2 between 0 and n.
f(x
1
t
123
I
i
c
n
X
The area under this curve is
J,”
x2 dx = n3/3; therefore we know that
El,
is
approximately
fn3.
46 SUMS
One way to use this fact is to examine the error in the approximation,
E,
=
0,
-
in3.
Since q
,,
satisfies the recurrence
0,
=
[7,-l
+ n2, we find
that
E,
satisfies the simpler recurrence
En =
II,-fn3
=
IJP1
+n2-in3
=
E,p1+~(n-1)3+n2-3n3
=
E,-1
+n-5.
Another way to pursue the integral approach is to find a formula for
E,
by
summing the areas of the wedge-shaped error terms. We have
s
n
on
-
0
x2dx =
2
(k2-/;P,x2dx)
k2
_
k3
-
(k-
1)3
3
=
f(k-f)
k=l
Either way, we could find
E,
and then
!I,.
Method 5: Expand and contract.
Yet another way to discover a closed form for Cl, is to replace the orig-
inal sum by a seemingly more complicated double sum that can actually be
simplified if we massage it properly:
=
t
(F)(n-j+l)
l<j$n
=
t
x
(n(n+l)+j-j2)
l<j<n
=
$n2(n+1)+$n(n+1)-50,
=
tn(n+
t)(n+
1
,-ton.
Going from a single sum to a double sum may appear at first to be a backward
step, but it’s actually progress, because it produces sums that are easier to
work with. We can’t expect to solve every problem by continually simplifying,
simplifying, and simplifying: You can’t scale the highest mountain peaks by
climbing only uphill!
Method 6: Use finite calculus.
Method 7: Use generating functions.
Stay tuned for still more exciting calculations of
Cl,,
= ,TL=, k2, as we
learn further techniques in the next section and in later chapters.
This is for people
addicted to calculus.
[The last step here
is something like
the last step of
the perturbation
method, because
we get an equation
with the unknown
quantity on both
sides.)
2.6 FINITE AND INFINITE CALCULUS 47
2.6 FINITE AND INFINITE CALCULUS
We’ve learned a variety of ways to deal with sums directly. Now it’s
time to acquire a broader perspective, by looking at the problem of summa-
tion from a higher level. Mathematicians have developed a “finite calculus,”
analogous to the more traditional infinite calculus, by which it’s possible to
approach summation in a nice, systematic fashion.
Infinite calculus is based on the properties of the derivative operator D,
defined by
Df(x) =
:rnO
f(x+
h)
-
f(x)
h
Finite calculus is based on the properties of the difference operator A, defined
by
Af(x) = f(x + 1) -f(x).
(2.42)
As opposed to a
cassette function.
This is the finite analog of the derivative in which we restrict ourselves to
positive integer values of h. Thus, h = 1 is the closest we can get to the
“limit” as h
+
0, and Af(x) is the value of (f(x + h)
-
f(x))/h when h = 1.
The symbols D and A are called operators because they operate on
functions to give new functions; they are functions of functions that produce
functions. If f is a suitably smooth function of real numbers to real numbers,
then Df is also a function from reals to reals. And if f is any real-to-real
function, so is Af. The values of the functions Df and Af at a point x are
given by the definitions above.
Early on in calculus we learn how D operates on the powers f(x) =
x"'.
In such cases
Df(x) =
mxmP’. We can write this informally with f omitted,
D(xm) =
mx”-‘,
It would be nice if the A operator would produce an equally elegant result;
unfortunately it doesn’t. We have, for example,
A(x3) =
(x+~)~-x’
= 3x
2
+3x+1.
Math power.
But there is a type of “mth power” that does transform nicely under A,
and this is what makes finite calculus interesting. Such newfangled mth
powers are defined by the rule
m
factors
A
XE
= Ix(x-l)...(x-mmlj, integer m 3 0.
(2.43)
Notice the little straight line under the m; this implies that the m factors
are supposed to go down and down, stepwise. There’s also a corresponding
48 SUMS
definition where the factors go up and up:
m factors
I
h
.
x
iii
=
x(x+l)...(x+m-l),
integer m 3 0.
(2.44)
When m = 0, we have
XQ
= x-
= 1, because a product of no factors is
conventionally taken to be 1 (just as a sum of no terms is conventionally 0).
The quantity xm is called “x to the m falling,” if we have to read it
aloud; similarly, xK is “x to the m rising!’ These functions are also called
falling factorial powers and rising factorial powers, since they are closely
related to the factorial function n! = n(n
-
1). . . (1). In fact, n! =
nz
= 1”.
Several other notations for factorial powers appear in the mathematical
literature, notably “Pochhammer’s symbol” (x), for xK or xm; notations
like
xc”‘)
or xlml are also seen for x3. But the underline/overline convention
is catching on, because it’s easy to write, easy to remember, and free of
redundant parentheses.
Falling powers xm are especially nice with respect to A. We have
A(G)
=
(x+1)=-x”
=
(x+1)x..
.(x-m++)
-
x...
(x--+2)(x-m+l)
=
mx(x-l)...(x-m+2),
hence the finite calculus has a handy law to match D(x”‘) = mx”-‘:
A(x”) =
mxd.
(2.45)
This is the basic factorial fact.
The operator D of infinite calculus has an inverse, the anti-derivative
(or integration) operator
J.
The Fundamental Theorem of Calculus relates D
to
J:
g(x)
=
Df(xl
if and only if
g(x) dx = f(x) + C.
Here s g(x) dx, the indefinite integral of g(x), is the class of functions whose
derivative is g(x). Analogously, A has as an inverse, the anti-difference (or
summation) operator
x;
and there’s another Fundamental Theorem:
g(x)
=
Af(xl
if and only if
xg(x)bx
= f(x)+C. (2.46)
Here
x
g(x) 6x, the indefinite sum of g(x), is the class of functions whose
diflerence is g(x). (Notice that the lowercase
6
relates to uppercase A as
d relates to D.) The “C” for indefinite integrals is an arbitrary constant; the
“C” for indefinite sums is any function p(x) such that p(x + 1) = p(x). For
Mathematical
terminology is
sometimes crazy:
Pochhammer
12341
actually used the
notation (x)
m
for the binomial
coefficient
(k)
, not
for factorial powers.
“Quemadmodum
ad differentiam
denotandam
usi
sumus
sign0
A,
ita summam indi-
cabimus sign0
L.
. . .
ex
quo
zquatio
z =
Ay,
siinver-
tatur,
dabit quoque
y = iEz+C.”
-L.
Euler
/88]
You call this a
In other words, the definite sum is the same as an ordinary sum with limits,
punch line?
but excluding the value at the upper limit.
2.6 FINITE AND INFINITE CALCULUS 49
example, C might be the periodic function a + b sin2nx; such functions get
washed out when we take differences, just as constants get washed out when
we take derivatives. At integer values of x, the function C is constant.
Now we’re almost ready for the punch line. Infinite calculus also has
definite integrals: If g(x) = Df(x), then
/‘g(x)dx
=
f(x)11
= f(b) -f(a).
a
Therefore finite calculus-ever mimicking its more famous cousin- has def-
inite Sims: If g(x) = Af(x), then
Lb
g(x) 6x =
f(x)i’
= f(b) -f(a).
(2.47)
a
a
This formula gives a meaning to the notation x.“, g(x) 6x, just as the previous
formula defines
Jl
g(x) dx.
But what does
xi
g(x) 6x really mean, intuitively? We’ve defined it by
analogy, not by necessity. We want the analogy to hold, so that we can easily
remember the rules of finite calculus; but the notation will be useless if we
don’t understand its significance. Let’s try to deduce its meaning by looking
first at some special cases, assuming that g(x) = Af(x) = f(x + 1) -f(x). If
b = a, we have
tIg(x)bx
= f(a)-f(a) = 0.
Next, if b = a +
1,
the result is
xl+’
g(x)
dx
=
f(a+
1) -f(a) = g(a).
More generally, if b increases by
1,
we have
-
x:
g(x) 6x =
(f(b
+ 1) -f(a))
-
(f(b) -f(a))
= f(b+ 1) -f(b) = g(b).
These observations, and mathematical induction, allow us to deduce exactly
what x.“, g(x) 6x means in general, when a and b are integers with b > a:
~-$xi~x
=
~g&,
=
x
g(k),
k=a
a<k<b
for integers b 3 a. (2.48)
50 SUMS
Let’s try to recap this; in a slightly different way. Suppose we’ve been
given an unknown sum that’s supposed to be evaluated in closed form, and
suppose we can write it in the form
taskcb
g(k) = I.“, g(x) 6x. The theory
of finite calculus tells us that we can express the answer as f(b)
-
f(a), if
we can only find an indefinite sum or anti-difference function f such that
g(x) = f (x + 1)
-
f(x).
C)ne
way to understand this principle is to write
t
aGk<b
g(k) out in full, using the three-dots notation:
x
(f(kf1)
-f(k)) =
(f(a+l)
-f(a)) +
(f(a+2)
-f(a+l))
f...
a<k<b
+
(f(b-1)
-
f(b-2)) + (f(b)
-
f(b-1)) .
Everything on the right-ha:nd side cancels, except f(b)
-
f(a); so f(b)
-
f(a)
is the value of the sum. (Sums of the form
,Yaskib(f(k
+ 1)
-
f(k)) are
often called telescoping, by analogy with a collapsed telescope, because the
thickness of a collapsed telescope is determined solely by the outer radius of
And all this time
the outermost tube and the inner radius of the innermost tube.)
I thought it was
But rule (2.48) applies only when b 3 a; what happens if b < a? Well,
telescoping because
it
collapsed
from
a
(2.47) says that we mUSt have
very long expression
to a very short one.
Lb
g(x) 6x = f(b) -f(a)
a
= -(f(a)-f(b)) =
-t,“g(x)tx.
This is analogous to the corresponding equation for definite integration. A
similar argument proves
t
i
+
xt
=
x.‘,,
the summation analog of the iden-
tity
ji
+
Ji
=
jz.
In full garb,
Lb
g(x)
6x
+
x;
g(x)
6x
=
xc
L?(X)
6x,
a a
(2.49)
for all integers a, b, and c.
At this point a few of us are probably starting to wonder what all these
parallels and analogies buy us. Well for one, definite summation gives us a
Others have been
simple way to compute sums of falling powers: The basic laws (2.45),
(2.47),
and (2.48) imply the general law
zify!$
zi,for
ka
n
nm+’
k”=-
=-
for integers
m,
n 3 0.
(2.50)
O<k<n
m+lo
m+l’
This formula is easy to remember because it’s so much like the familiar
sit
x”’
dx =
n”‘+‘/(m+
1).
2.6 FINITE AND INFINITE CALCULUS 51
In particular, when m = 1 we have
kl
= k, so the principles of finite
calculus give us an easy way to remember the fact that
ix
k =
f
=
n(n-1)/2
OS-kin
The definite-sum method also gives us an inkling that sums over the range
0 $ k < n often turn out to be simpler than sums over 1 < k 6 n; the former
are just f(n)
-
f
(0))
while the latter must be evaluated as f (n + 1)
-
f ( 1)
Ordinary powers can also be summed in this new way, if we first express
them in terms of falling powers. For example,
hence
t
OSk<n
k2
=
z+:
=
in(n-l)(n-2+;)
=
$n(n-i)(n-1).
With friends like
this..
Replacing n by n + 1 gives us yet another way to compute the value of our
old friend q
,,
= ~O~k~n
k2
in closed form.
Gee, that was pretty easy. In fact, it was easier than any of the umpteen
other ways that beat this formula to death in the previous section. So let’s
try to go up a notch, from squares to cubes: A simple calculation shows that
k3
=
kL+3kL+kL.
(It’s always possible to convert between ordinary powers and factorial powers
by using Stirling numbers, which we will study in Chapter 6.) Thus
Falling powers are therefore very nice for sums. But do they have any
other redeeming features? Must we convert our old friendly ordinary powers
to falling powers before summing, but then convert back before we can do
anything else? Well, no, it’s often possible to work directly with factorial
powers, because they have additional properties. For example, just as we
have (x + y)’ = x2 + 2xy + y2, it turns out that (x +
y)’
= x2 +
2x!-yl+
yz,
and the same analogy holds between (x + y)” and (x +
y)“.
(This “factorial
binomial theorem” is proved in exercise 5.37.)
So far we’ve considered only falling powers that have nonnegative expo-
nents. To extend the analogies with ordinary powers to negative exponents,
52 SUMS
we need an appropriate definition of ~3 for m < 0. Looking at the sequence
x3
=
x(x-1)(x-2),
XL
= x(x-l),
x1
= x,
XQ
= 1,
we notice that to get from
x2
to
x2
to xl to
x0
we divide by x
-
2, then
by x
-
1, then by
X.
It seems reasonable (if not imperative) that we should
divide by x + 1 next, to get from
x0
to
x5,
thereby making
x5
= 1 /(x + 1).
Continuing, the first few negative-exponent falling powers are
1
x;1
=
-
x+1
'
x-2
= (x+*:(x+2)
'
1
x-3
=
(x+1)(x+2)(x+3)
and our general definition for negative falling powers is
1
'-"'
=
(x+l)(x+2)...(x+m)
for m
>
0.
(2.51)
(It’s also possible to define falling powers for real or even complex m, but we
How can a
complex
will defer that until Chapter 5.)
number be even?
With this definition, falling powers have additional nice properties. Per-
haps the most important is a general law of exponents, analogous to the law
X
m+n
=
XmXn
for ordinary powers. The falling-power version is
xmi-n
=
xZ(x-m,)n,
integers m and n.
For example,
xs
=
x1
(x
-
2)z;
and with a negative n we have
(2.52)
x23
zz
xqx-q-3
= x(x- 1)
1
1
(x- 1)x(x+ 1)
=
-
=
x;l,
x+1
If we had chosen to define
xd
as l/x instead of as 1
/(x
+
l),
the law of
exponents (2.52) would have failed in cases like m = -1 and n = 1. In fact,
we could have used (2.52) to tell us exactly how falling powers ought to be
defined in the case of negative exponents, by setting m = -n. When an
Laws have their
existing notation is being extended to cover more cases, it’s always best to
exponents and their
formulate definitions in such. a way that general laws continue to hold.
detractors.
2.6 FINITE AND INFINITE CALCULUS 53
Now let’s make sure that the crucial difference property holds for our
newly defined falling powers. Does
Ax2
=
mx*
when m < O? If m = -2,
for example, the difference is
A&
=
1
1
(x+2)(x+3)
-
(x+1)(x+2)
(x+1)-(x+3)
=
(x+1)(%+2)(x+3)
=
-2y-3,
Yes -it works! A similar argument applies for all m < 0.
Therefore the summation property (2.50) holds for negative falling powers
as well as positive ones, as long as no division by zero occurs:
x
b
Xmfl
b
x”&
=
-
for mf-1
a
m+l
(1’
But what about when m =
-l?
Recall that for integration we use
s
b
x-’
dx = lnx
b
a
a
when m = -1. We’d like to have a finite analog of lnx; in other words, we
seek a function
f(x)
such that
x-'
=
1
-
= Af(x) = f(x+ 1)-f(x).
x+1
It’s not too hard to see that
f(x) =
;
+
;
f...f
;
0.577
exactly?
Maybe they mean
l/d.
Then again,
maybe not.
is such a function, when x is an integer, and this quantity is just the harmonic
number H, of (2.13). Thus H, is the discrete analog of the continuous lnx.
(We will define H, for noninteger x in Chapter 6, but integer values are good
enough for present purposes. We’ll also see in Chapter 9 that, for large x, the
value of H,
-
In x is approximately 0.577 +
1/(2x).
Hence H, and In x are not
only analogous, their values usually differ by less than 1.)
We can now give a complete description of the sums of falling powers:
z
b
ifmf-1;
x”6x
=
(2.53)
a
ifm=-1.
54 SUMS
This formula indicates why harmonic numbers tend to pop up in the solutions
to discrete problems like the analysis of quicksort, just as so-called natural
logarithms arise naturally in the solutions to continuous problems.
Now that we’ve found an analog for lnx, let’s see if there’s one for e’.
What function f(x) has the property that Af(x) = f(x), corresponding to the
identity De” = e”? Easy:
f(x+l)-f(X) = f(x)
w
f(x+ 1) =
2f(x);
so we’re dealing with a simple recurrence, and we can take f(x) = 2” as the
discrete exponential function.
The difference of
cx
is also quite simple, for arbitrary c, namely
A(?)
= cx+’
-
cX
= (c
-
1)~“.
Hence the anti-difference of
cx
is c’/(c
-
1
),
if c #
1.
This fact, together with
the fundamental laws (2.47) and (2.48), gives us a tidy way to understand the
general formula for the sum of a geometric progression:
t
a<k<b
for c # 1.
Every time we encounter a function f that might be useful as a closed
form, we can compute its difference Af = g; then we have a function g whose
indefinite sum
t
g(x) 6x is known. Table 55 is the beginning of a table of
‘Table 55’ is
OR
difference/anti-difference pairs useful for summation.
page 55. Get it?
Despite all the parallels between continuous and discrete math, some
continuous notions have no discrete analog. For example, the chain rule of
infinite calculus is a handy rule for the derivative of a function of a function;
but there’s no corresponding chain rule of finite calculus, because there’s no
nice form for Af (g (x)) . Discrete change-of-variables is hard, except in certain
cases like the replacement of x by c
f
x.
However, A(f(x) g(x))
d
oes
have a fairly nice form, and it provides us
with a rule for summation by parts, the finite analog of what infinite calculus
calls integration by parts. Let’s recall that the formula
D(uv) =
uDv+vDu
of infinite calculus leads to
t’he
rule for integration by parts,
s
uDv
= uv-
s
VDU,
Infinite calculus
avoids E
here by
letting
1 -3 0.
1
guess
ex
=
2”)
for
small values of
1
2.6 FINITE AND INFINITE CALCULUS 55
Table 55 What’s the difference?
f =
zg
Af = g
x0
=
1
0
x1
=
x
1
x2=x(x-l)
2x
XB
mxti
xmf'/(m+l)
x=
HX
x-‘=
l/(x+1)
f=Lg
Af = g
2"
2"
CX
(c
-
1
)cX
c"/(c-1)
cx
cf
cAf
f+g
Af+Ag
f
g
fAg
+ EgAf
after integration and rearranging terms; we can do a similar thing in finite
calculus.
We start by applying the difference operator to the product of two func-
tions u(x) and v(x):
A@(x) v(x)) =
u(x+l)
v(x+l)
-
u(x) v(x)
= u(x+l)v(x+l)-u(x)v(x+l)
+u(x)v(x+l)-u(x)v(x)
= u(x) Av(x) +
v(x+l)
Au(x).
(2.54)
This formula can be put into a convenient form using the
shij?!
operator E,
defined by
Ef(x) = f(x+l).
Substituting this for
v(x+l)
yields a compact rule for the difference of a
product:
A(uv)
= uAv + EvAu.
(2.55)
(The E is a bit of a nuisance, but it makes the equation correct.) Taking
the indefinite sum on both sides of this equation, and rearranging its terms,
yields the advertised rule for summation by parts:
ix
uAv =
uv-
t
EvAu.
(2.56)
As with infinite calculus, limits can be placed on all three terms, making the
indefinite sums definite.
This rule is useful when the sum on the left is harder to evaluate than the
one on the right. Let’s look at an example. The function s xe’ dx is typically
integrated by parts; its discrete analog is
t
x2’
6x, which we encountered
earlier this chapter in the form
xt=,
k2k. To sum this by parts, we let
56 SUMS
u(x) = x and
Av(x)
=
2’;
hence Au(x) = 1, v(x) =
2x,
and
Ev(x)
=
2X+1.
Plugging into (2.56) gives
x
x2”
sx
= x2”
-
t
2X+’
6x = x2” -
2x+’
+
c.
And we can use this to evaluate the sum we did before, by attaching limits:
f
k2k
=
t;+‘x2”
6x
k=@
=
x2X-2X+l
ll+’
= ((n-t
1)2”+’
-2n+2)
-
(0.2’-2’)
= (n-
1)2n+’
f2.
It’s easier to find the sum this way than to use the perturbation method,
because we don’t have to tlrink.
The ultimate
goal
We stumbled across a formula for toSk<,,
Hk
earlier in this chapter,
!fmat!ernatics
and counted ourselves lucky. But we could have found our formula (2.36)
systematically, if we had known about summation by parts. Let’s demonstrate
~~~$~/~t$$rt
thought.
this assertion by tackling a sum that looks even harder, toSk<,,
kHk.
The
solution is not difficult if we are guided by analogy with
s
x
In
x dx: We take
u(x) =
H,
and
Av(x)
= x
:=
x1,
hence Au(x) =
x5,
v(x) =
x2/2,
Ev(x)
=
(x +
1)2/2,
and we have
(x
+
1)’
xxH,Sx
=
;Hx
-
x7
x-’
6x
=
;Hx
-
fxx16x
(In going from the first line to the second, we’ve combined two falling pow-
ers
(x+1)2x5
by using the law of exponents (2.52) with m = -1 and n = 2.)
Now we can attach limits and conclude that
x
kHk =
t;xHx6x
=
;(Hn-;),
OSk<n
2.7 INFINITE SUMS
(2.57)
When we defined t-notation at the beginning of this chapter, we
finessed the question of infinite sums by saying, in essence, “Wait until later.
J&
is finesse?
For now, we can assume that all the sums we meet have only finitely many
nonzero terms.”
But the time of reckoning has finally arrived; we must face
Sure: 1 + 2 +
4 + 8 + . . is the
“infinite precision”
representation of
the number -1,
in a binary com-
puter with infinite
word size.
2.7 INFINITE SUMS 57
the fact that sums can be infinite. And the truth is that infinite sums are
bearers of both good news and bad news.
First, the bad news: It turns out that the methods we’ve used for manip-
ulating
1’s
are not always valid when infinite sums are involved. But next,
the good news: There is a large, easily understood class of infinite sums for
which all the operations we’ve been performing are perfectly legitimate. The
reasons underlying both these news items will be clear after we have looked
more closely at the underlying meaning of summation.
Everybody knows what a finite sum is: We add up a bunch of terms, one
by one, until they’ve all been added. But an infinite sum needs to be defined
more carefully, lest we get into paradoxical situations.
For example, it seems natural to define things so that the infinite sum
s =
l+;+;+f+&+&+...
is equal to 2, because if we double it we get
2s =
2+1+;+$+;+$+.-
=
2+s.
On the other hand, this same reasoning suggests that we ought to define
T =
1+2+4+8+16+32-t...
to be -1, for if we double it we get
2T =
2+4+8+16+32+64+...
= T-l.
Something funny is going on; how can we get a negative number by summing
positive quantities? It seems better to leave T undefined; or perhaps we should
say that T = 00, since the terms being added in T become larger than any
fixed, finite number. (Notice that
cc
is another “solution” to the equation
2T = T
-
1; it also “solves” the equation 2S = 2 + S.)
Let’s try to formulate a good definition for the value of a general sum
x
kEK
ok, where K might be infinite. For starters, let’s assume that all the
terms
ok
are nonnegative. Then a suitable definition is not hard to find: If
there’s a bounding constant A such that
for all finite subsets F c K, then we define
tkeK
ok
to be the least such A.
(It follows from well-known properties of the real numbers that the set of
all such A always contains a smallest element.) But if there’s no bounding
constant A, we say that ,YkEK
ok
= 00; this means that if A is any real
number, there’s a set of finitely many terms
ok
whose sum exceeds A.
58 SUMS
The definition in the previous paragraph has been formulated carefully
so that it doesn’t depend on any order that might exist in the index set K.
Therefore the arguments we are about to make will apply to multiple sums
with many indices kl , k2, . . ,
not just to sums over the set of integers.
In the special case that K is the set of nonnegative integers, our definition
for nonnegative terms
ok
implies that
Here’s why: Any nondecreasing sequence of real numbers has a limit (possi-
bly
ok).
If the limit is A, and if F is any finite set of nonnegative integers
whose elements are all 6 n, we have
tkEF
ok 6 ~~Zo
ok
< A; hence A =
co
or A is a bounding constant. And if A’ is any number less than the stated
limit A, then there’s an n such that
~~=,
ok
> A’; hence the finite set
F
={O,l,...
,n} witnesses to the fact that A’ is not a bounding constant.
We can now easily
com,pute
the value of certain infinite sums, according
to the definition just given. For example, if
ok
= xk, we have
The set K might
even be uncount-
able. But only a
countable num-
ber of terms can
be nonzero, if a
bounding constant
A exists, because at
most nA terms are
3 l/n.
In particular, the infinite sums S and T considered a minute ago have the re-
spective values 2 and
co,
just as we suspected. Another interesting example is
k5
n
=
l.im~k~=J~m~_l
=l.
n-+cc
k=O
0
Now let’s consider the
‘case
that the sum might have negative terms as
well as nonnegative ones. What, for example, should be the value of
E(-1)k =
l-l+l--l+l-l+~~~?
k>O
If we group the terms in pairs, we get
“Aggregatum
quantitatum
a-a+a-a+a--a
etc.
nunc
est = a,
(l--1)+(1-1)+(1-1)+... =
O+O+O+...
)
nunc
= 0, adeoque
continuata in
infini-
so the sum comes out zero; but if we start the pairing one step later, we get
turn serie ponendus
= a/2,
fateor
‘-(‘-‘)-(1-1)-(1-l)-...
=
‘-O-O-O-...;
acumen et veritatem
animadversionis
ture.”
-G.
Grandi
1133)
the sum is
1.
2.7 INFINITE SUMS 59
We might also try setting x = -1 in the formula
&O
xk = 1
/(l
-
x),
since we’ve proved that this formula holds when 0 < x < 1; but then we are
forced to conclude that the infinite sum is
i,
although it’s a sum of integers!
Another interesting example is the doubly infinite
tk
ok
where
ok
=
l/(k+
1) for k 3 0 and
ok
=
l/(k-
1) for k < 0. We can write this as
.'.+(-$)+(-f)+(-;)+l+;+f+;+'.'.
(2.58)
If we evaluate this sum by starting at the “center” element and working
outward,
..+
(-$+(-f
+(-;
+(l)+
;,+
g-t
;> +...,
we get the value 1; and we obtain the same value 1 if we shift all the paren-
theses one step to the left,
+(-j+(-;+cf+i-;)+l)+;)+:)+.y
because the sum of all numbers inside the innermost n parentheses is
11
1
-----...-
j+,+;+...+L
=
l-L_
1
nfl
n n-l n
K-3’
A similar argument shows that the value is 1 if these parentheses are shifted
any fixed amount to the left or right; this encourages us to believe that the
sum is indeed
1.
On the other hand, if we group terms in the following way,
..+(-i+(-f+(-;+l+;,+f+;)+;+;)+...,
the nth pair of parentheses from inside out contains the numbers
11
1
----
-...-
2+,+;+...+
n+l n
&
+
&
=
1
+ Hz,,
-
&+I
.
We’ll prove in Chapter 9 that
lim,,,
(Hz,-H,+,
) = ln2; hence this grouping
suggests that the doubly infinite sum should really be equal to 1 + ln2.
There’s something flaky about a sum that gives different values when
its terms are added up in different ways.
Advanced texts on analysis have
a variety of definitions by which meaningful values can be assigned to such
pathological sums; but if we adopt those definitions, we cannot operate with
x-notation as freely as we have been doing. We don’t need the delicate refine-
ments of “conditional convergence”
for the purposes of this book; therefore
Is this the first page
we’ll stick to a definition of infinite sums that preserves the validity of all the
with no graffiti?
operations we’ve been doing in this chapter.
60 SUMS
In fact, our definition of infinite sums is quite simple. Let K be any
set, and let
ok
be a real-valued term defined for each k
E
K. (Here ‘k’
might actually stand for several indices kl , k2, . . , and K might therefore be
multidimensional.) Any real number x can be written as the difference of its
positive and negative parts,
x
.=
x+-x
where x+
=x.[x>O]
and
x-
=
-x.[x<Ol.
(Either x+=Oorx
~
= 0.) We’ve already explained how to define values for
the infinite sums
t
kEK
‘:
and
tkEK
ak
j
~
because
al
and a{ are nonnegative.
Therefore our general definition is
ak
=
(2.59)
kEK
kEK
kGK
unless the right-hand sums are both equal to
co.
In the latter case, we leave
IL
keK
ok
undefined.
Let A+ =
,YkEK
a:
and A- =
tktK
ai.
If A+ and A- are both finite,
the sum
tkEK
ok is said to converge absolutely to the value A =
A+
-
A-.
In other words,
ab-
If
A+
==
00
but A is finite, the sum
tkeK
ok
is said to diverge to
+a.
so1ute
convergence
Similarly, if A- =
00
but A+ is finite,
tktK
ok
is said to diverge to
--oo.
If
$e~~~o~o:,“,a,“~~~U~~m
A+
= A- = 00, all bets are off.
converges.
We started with a definition that worked for nonnegative terms, then we
extended it to real-valued terms. If the terms
ok
are complex numbers, we
can extend the definition
on.ce
again, in the obvious way: The sum tkeK
ok
is defined to be tkCK
%ok
+
itk,-K
Jok, where 3iok and 3ok are the real
and imaginary parts of ok--provided that both of those sums are defined.
Otherwise
tkEk
ok
is undefined. (See exercise 18.)
The bad news, as stated earlier, is that some infinite sums must be left
undefined, because the manipulations we’ve been doing can produce inconsis-
tencies in all such cases. (See exercise 34.) The good news is that all of the
manipulations of this chapter are perfectly valid whenever we’re dealing with
sums that converge absolutely, as just defined.
We can verify the good news by showing that each of our transformation
rules preserves the value of all absolutely convergent sums. This means, more
explicitly, that we must prove the distributive, associative, and commutative
laws, plus the rule for summing first on one index variable; everything else
we’ve done has been derived from those four basic operations on sums.
The distributive law (2.15) can be formulated more precisely as follows:
If
tkEK
ok
converges absolmely to A and if c is any complex number, then
Ix
keK
cok
converges absolutely to
CA.
We can prove this by breaking the sum
into real and imaginary, positive and negative parts as above, and by proving
the special case in which c
;>
0 and each term
ok
is nonnegative. The proof
2.7 INFINITE SUMS 61
in this special case works because tkEF
cok
= c
tkeF
ok
for all finite
Sets
F;
the latter fact follows by induction on the size of F.
The associative law (2.16) can be stated as follows: If tkEK
ok
and
tkeK
bk
converge absolutely to A and B, respectively, then
tkek(ok
+
bk)
converges absolutely to A + B. This turns out to be a special case of a more
general theorem that we will prove shortly.
The commutative law (2.17) doesn’t really need to be proved, because
we have shown in the discussion following (2.35) how to derive it as a special
case of a general rule for interchanging the order of summation.
The main result we need to prove is the fundamental principle of multiple
sums: Absolutely convergent sums over two or more indices can always be
summed first with respect to any one of those indices. Formally, we shall
Best to skim this
prove that if J and the elements of
{Ki
1 j
E
J} are any sets of indices such that
page the first time
you get here.
-
Your friendly
TA
x
oi,k
converges absolutely to A,
iEJ
kEKj
then there exist complex numbers
Aj
for each j
E
J such that
IL
oj,k
converges absolutely to Aj, and
&K,
t
Aj converges absolutely to A.
iEJ
It suffices to prove this assertion when all terms are nonnegative, because we
can prove the general case by breaking everything into real and imaginary,
positive and negative parts as before. Let’s assume therefore that
oi,k
3 0 for
all pairs (j, k)
E
M, where M is the master index set {(j, k) 1 j
E
J, k
E
Kj}.
We are given that
tCj,k)EM
oj,k
is finite, namely that
L
aj,k
6
A
(j.k)EF
for all finite subsets F
C
M, and that A is the least such upper bound. If j is
any element of J, each sum of the form
xkEFi
oj,k
where
Fj
is a finite subset
of
Kj
is bounded above by A. Hence these finite sums have a least upper
bound Ai 3 0, and tkEKi
oj,k
= Aj by definition.
We still need to prove that A is the least upper bound of xjEG Aj,
for all finite subsets G
G
J. Suppose that G is a finite subset of J with
xjEG Aj = A’ > A. We CXI find finite subsets
Fi
c
Kj
such that tkeFi
oj,k
>
(A/A’)Aj for each j
E
G with Aj > 0. There is at least one such j. But then
~.iEG,kCFi
oj,k
> (A/A’) xjEG Aj = A, contradicting the fact that we have
62 SUMS
tCj,kiEF
J,
a.
k
< A for all finite subsets F
s
M. Hence
xjEG
Aj
< A, for all
finite subsets G
C
J.
Finally, let A’ be any real number less than A. Our proof will be complete
if we can find a finite set G
C
J such that
xjeo
Aj
> A’. We know that
there’s a finite set F
C:
M such that &j,kIeF
oj,k
> A’; let G be the set of j’s
in this F, and let
Fj
= {k 1 (j, k)
E
F}. Then
xjeG
A,
3
xjEG
tkcF,
oj,k
=
t(j,k)EF aj,k > A’;
QED.
OK, we’re now legitimate! Everything we’ve been doing with infinite
sums is justified, as long a3 there’s a finite bound on all finite sums of the
absolute values of the terms. Since the doubly infinite sum (2.58) gave us
two different answers when we evaluated it in two different ways, its positive
s0
whY
have
f
been
terms 1 +
i
+
5
+.
. . must diverge to 03; otherwise we would have gotten the
hearing a lot lately
about “harmonic
same answer no matter how we grouped the terms.
convergence”?
Exercises
Warmups
1
What does the notation
0
2
qk
k=4
mean?
2
Simplify the expression x . ([x >
01
-
[x
< 01).
3 Demonstrate your understanding of t-notation by writing out the sums
in full. (Watch out -the second sum is a bit tricky.)
4 Express the triple sum
aijk
lSi<j<k<4
as a three-fold summation (with three
x’s),
a
summing first on k, then j, then i;
b
summing first on i, then j, then k.
Also write your triple sums out in full without the t-notation, using
parentheses to show what is being added together first.
2 EXERCISES 63
5 What’s wrong with the following derivation?
6 What is the value of
tk[l
6 j $
k<
n], as a function of j and n?
Yield to the rising
7 Let Vf(x) = f(x)
-
f(x-1). What is
V(xm)?
power.
8 What is the value of O”, when m is a given integer?
9 What is the law of exponents for rising factorial powers, analogous to
(2.52)? Use this to define
XC”.
10
The text derives the following formula for the difference of a product:
A(uv)
= uAv + EvAu.
How can this formula be correct, when the left-hand side is symmetric
with respect to u and v but the right-hand side is not?
Basics
11
The general rule (2.56) for summation by parts is equivalent to
I(
ak+l
-
ak)bk
=
anbn
-
aOb0
O$k<n
-t
%+I
h+l
-
bd, for
n
3
0.
O<k<n
Prove this formula directly by using the distributive, associative, and
commutative laws.
12
Show that the function p(k) =
kf
(-l)k~
is a permutation of the set of
all integers, whenever c is an integer.
13 Use the repertoire method to find a closed form for
xr=o(-l)kk2.
14 Evaluate
xi=,
k2k by rewriting it as the multiple sum
tlbjGkGn
2k.
15 Evaluate
Gil,,
=
EL=,
k3
by the text’s Method 5 as follows: First write
an
+ q n =
2
xl$j<k$n
jk;
then
aPPlY
(2.33).
16 Prove that
x”/(x
-
n)”
=
x3/(x
-
m)n, unless one of the denominators
is zero.
17 Show that the following formulas can be used to convert between rising
and falling factorial powers, for all integers m:
iii
X = (-l)"(-x)2 =
(x+m-1)"
=
l/(x-l)=;
-
xl'l.
=
(-l)"(-x)"
=
(x-m+l)"
= l/(x+1)-m.
-
(The answer to exercise 9 defines x-“‘.)
64 SUMS
18 Let
9%~
and Jz be the real and imaginary parts of the complex num-
ber
z.
The absolute value
Iz/
is
J(!??z)~
+
(3~)~.
A sum
tkeK
ok
of com-
plex terms
ok
is said to converge absolutely when the real-valued sums
t&K
*ak
and
tkEK
?ok both converge absolutely. Prove that tkEK
ok
converges absolutely if and only if there is a bounding constant B such
that xkEF
[oki
< B for
,a11
finite subsets F
E
K.
Homework exercises
19
20
21
22
23
24
25
26
Use a summation factor to solve the recurrence
To
= 5;
2T,,
=
nT,-,
+ 3 . n! ,
for n > 0.
Try to evaluate ~~=, kHk by the perturbation method, but deduce the
VdUe
of
~~=:=,
Hk
instead.
Evaluate the sums S, =
xc=o(-l)n-k,
T,
= ~~=o(-l)n-kk, and Ll, =
t;=o(-l)n-kk2
b
y
the perturbation method, assuming that n 3 0.
Prove Lagrange’s identity (without using induction):
It’s hard to prove
the
identity
of
t
(Cljbk-Clkbj)2 =
(~Cl~)(~b~)
-
(LClkbk)‘.
1
<j<k<n
k=l k=l
This, incidentally, implies Cauchy’s inequality,
(2
akbb)l
6
(5
d)
(f
bZk)
k:=l
k=l
Evaluate the sum
Et=:=,
(2k + 1 )/(k(k + 1)) in two ways:
a
Replace 1
/k(k
+
1) by the “partial fractions” 1
/k
-
1
/(k
+ 1).
b Sum by parts.
What is to<k<n
&/(k
+ l)(k +
2)?
Hint: Generalize the derivation of
(2.57).
The notation
nk,k
ok means the product of the numbers ok for all k
E
K. This
notation was
Assume for simplicity that
ok
# 1 for only finitely many k; hence infinite
introduced
bY
products need not be defined. What laws does this n-notation satisfy,
Jacobi in 1829
[162].
analogous to the distributive, associative, and commutative laws that
hold for
t?
Express the double product nlsjQkbn
oj
ok in terms of the single product
nEz,
ok
by manipulating n-notation. (This exercise gives us a product
analog
of the upper-triangle identity
(2.33).)
2 EXERCISES 65
27 Compute A(cx), and use it to deduce the value of
xE=,
(-2)k/k.
28 At what point does the following derivation go astray?
==(
k>l
j31
F[j=k+l]-k[j=k-1]
>
=
=(
j>l
k>l
;[j=k+l]-k[j=k-1]
)
;[k=j-l]-i[k=j+l]
Exam problems
=x(
j-l j
---
i31
i
j+l
=
&&
=
-'.
29 Evaluate the sum
,&
(-l)kk/(4k2
-
1).
30 Cribbage players have long been aware that 15 = 7 + 8 = 4 + 5 + 6 =
1 + 2 + 3 + 4 + 5. Find the number of ways to represent 1050 as a sum of
consecutive positive integers. (The trivial representation ‘1050’ by itself
counts as one way; thus there are four, not three, ways to represent 15
as a sum of consecutive positive integers. Incidentally, a knowledge of
cribbage rules is of no use in this problem.)
31 Riemann’s zeta function c(k) is defined to be the infinite sum
Prove that
tka2(L(k)
-
1) = 1. What is the value of
tk?l
(L(2k)
-
l)?
32
Let a 2 b = max(0, a
-
b). Prove that
tmin(k,x’k)
=
x(x:
(2k+
1))
k>O
k?O
for all real
x
3 0, and evaluate the sums in closed form.
Bonus problems
The laws of the
jungle.
33
Let
/\kcK
ok
denote the minimum of the numbers
ok
(or their greatest
lower bound, if K is infinite), assuming that each
ok
is either real or
foe.
What laws are valid for A-notation, analogous to those that work for
t
and
n?
(See exercise 25.)
66
SUMS
34
35
36
Prove that if the sum
tkeK
ok
is undefined according to (zsg), then it
is extremely flaky in the following sense: If
A-
and A+ are any given
real numbers, it’s possible to find a sequence of finite subsets
F1
c Fl c
F3
(I
. . of K such that
IL
ak
6 A-,
when n is odd;
t
ak
>
A+,
when
n
is even.
&Fn
kEFn
Prove Goldbach’s theorem
1 =
;+;+;+:;+;+&+$+&+...
=
t’,
kEP
k-’
where
P
is the set of “perfect powers” defined recursively as follows:
Perfect
power
corrupts perfectly.
P = {mn 1 m 3 2,n 3 2,m
@
P}.
Solomon
Golomb’s
“self.-describing sequence” (f (1) , f
(2))
f
(3))
. . . ) is the
only nondecreasing sequence of positive integers with the property that
it contains exactly f(k) occurrences of k for each k. A few moments’
thought reveals that the sequence must begin as follows:
c+++x:i::::lk2
Let g(n) be the largest integer m such that f(m) = n. Show that
a s(n) = EC=,
f(k).
b 9(9(n)) = Ed=, Wk).
c
9(9(9(n))) = ing(fl)(g(n) +
1)
-
i
IL;::
g(k)(g(k) +
1).
Research
problem
37 Will all the l/k by
l/(k
+ 1) rectangles, for k 3 1, fit together inside a
1 by 1 square? (Recall that their areas sum to
1.1
3
Integer Functions
)Ouch.(
WHOLE NUMBERS constitute the backbone of discrete mathematics, and we
often need to convert from fractions or arbitrary real numbers to integers. Our
goal in this chapter is to gain familiarity and fluency with such conversions
and to learn some of their remarkable properties.
3.1 FLOORS AND CEILINGS
We start by covering the floor (greatest integer) and ceiling (least
integer) functions, which are defined for all real x as follows:
1x1
= the greatest integer less than or equal to x;
[xl
= the least integer greater than or equal to x .
(3.1)
Kenneth E. Iverson introduced this notation, as well as the names “floor” and
“ceiling,” early in the 1960s
[161,
page
121.
He found that typesetters could
handle the symbols by shaving the tops and bottoms off of
[’
and
‘I
‘.
His
notation has become sufficiently popular that floor and ceiling brackets can
now be used in a technical paper without an explanation of what they mean.
Until recently, people had most often been writing
‘[xl’
for the greatest integer
6 x, without a good equivalent for the least integer function. Some authors
had even tried to use
‘]x[‘-with
a predictable lack of success.
Besides variations in notation, there are variations in the functions them-
selves. For example, some pocket calculators have an INT function, defined
as
1x1
when x is positive and
[xl
when x is negative. The designers of
these calculators probably wanted their INT function to satisfy the iden-
tity INT(-x) =
-INT(x).
But we’ll stick to our floor and ceiling functions,
because they have even nicer properties than this.
One good way to become familiar with the floor and ceiling functions
is to understand their graphs, which form staircase-like patterns above and
67
68 INTEGER FUNCTIONS
below the line f(x) = x:
We see from the graph that., for example,
lel
=
2
,
l-ej
=-3,
Tel
=
3,
r-e] = -2,
since e
:=
2.71828.. . .
By staring at this illustration we can observe several facts about floors
and ceilings. First, since the floor function lies on or below the diagonal line
f(x) = x, we have
1x1
6
x;
similarly [xl 3 x. (This, of course, is quite
obvious from the definition.) The two functions are equal precisely at the
integer points:
lx] =
x
*
x is an integer
[xl
= x.
(We use the notation
‘H’
to mean “if and only if!‘) Furthermore, when
they differ the ceiling is exactly 1 higher than the floor:
[xl
-
1x1
=
[x
is not an integer] .
(3.2)
Cute.
By
Iverson
‘s
bracket
If we shift the diagonal line down one unit, it lies completely below the floor
conventions
this
is
a
function, so x
-
1 <
1x1;
similarly x + 1 >
[xl.
Combining these observations
complete equation.
gives us
x-l <
lx]
6
x
6
[xl
<
x+1.
(3.3)
Finally, the functions are reflections of each other about both axes:
l-XJ
=
-[xl
;
r-x.1
= -1xJ
(3.4)
3.1 FLOORS AND CEILINGS 69
Next week we’re
getting
walls.
Thus each is easily expressible in terms of the other. This fact helps to
explain why the ceiling function once had no notation of its own. But we
see ceilings often enough to warrant giving them special symbols, just as we
have adopted special notations for rising powers as well as falling powers.
Mathematicians have long had both sine and cosine, tangent and cotangent,
secant and cosecant, max and min; now we also have both floor and ceiling.
To actually prove properties about the floor and ceiling functions, rather
than just to observe such facts graphically, the following four rules are espe-
cially useful:
1x1
=n
w
n<x<n+l,
(a)
LxJ=n
H
x-l<n<x,
(b)
[xl=n
H
n-l
<x<n,
(c)
(3.5)
[xl=n
(j
x$n<x+l.
(4
(We assume in all four cases that n is an integer and that x is real.) Rules
(a) and (c) are immediate consequences of definition
(3.1);
rules (b) and (d)
are the same but with the inequalities rearranged so that n is in the middle.
It’s possible to move an integer term in or out of a floor (or ceiling):
lx + n] =
1x1
+ n,
integer n.
(3.6)
(Because rule (3.5(a)) says that this assertion is equivalent to the inequalities
1x1
+ n < x + n <
Lx]
+ n + 1.) But similar operations, like moving out a
constant factor, cannot be done in general. For example, we have
[nx]
# n[x]
when n = 2 and x =
l/2.
This means that floor and ceiling brackets are
comparatively inflexible. We are usually happy if we can get rid of them or if
we can prove anything at all when they are present.
It turns out that there are many situations in which floor and ceiling
brackets are redundant, so that we can insert or delete them at will. For
example, any inequality between a real and an integer is equivalent to a floor
or ceiling inequality between integers:
x<n
H
Lx]<n,
(4
n<x
H
n <
[xl,
(b)
x6n
*
[xl
6
n,
Cc)
(3.7)
n6x
w
n 6
1x1
.
(4
These rules are easily proved. For example, if x < n then surely
1x1
< n, since
1x1
6 x. Conversely, if
1x1
< n then we must have x < n, since x <
lx]
+ 1
and
1x1
+ 1 < n.
It would be nice if the four rules in (3.7) were as easy to remember as
they are to prove. Each inequality without floor or ceiling corresponds to the
70 INTEGER FUNCTIONS
same inequality with floor or with ceiling; but we need to think twice before
deciding which of the two is appropriate.
The difference between. x and
1x1
is called the fractional part of x, and
it arises often enough in applications to deserve its own notation:
{x}
=
x
-
lx]
.
(3.8)
We sometimes call
Lx]
the integer part of x, since
x
=
1x1
+ {x}. If a real
number x can be written in the form x = n + 8, where n is an integer and
0 <
8
<:
1,
we can conclude by (3.5(a)) that n =
1x1
and
8
= {x}.
Identity (3.6) doesn’t hold if n is an arbitrary real. But we can deduce
that there are only two possibilities for lx +
y]
in general: If we write x =
1x1
+
{x}
and y =
[yJ
+
{y},
then we have lx +
yJ
=
1x1
+
LyJ
+
1(x>
+
{y}J.
And since 0 < {x} + {y} < 2, we find that sometimes lx +
y]
is
1x1
+ [y],
otherwise it’s
1x1
+
[y]
+
1.
3.2
FLOOR/CEILING APPLICATIONS
We’ve now seen the basic tools for handling floors and ceilings. Let’s
put them to use, starting with an easy problem: What’s [lg351? (We use ‘lg’
to denote the base-2 logarithm.) Well, since
25
< 35 6 26, we can take logs
to get 5 < lg35 6 6; so (3.5(c)) tells us that [lg35] = 6.
Note that the number 35 is six bits long when written in radix 2 notation:
35 = (100011)~. Is it always true that [lgnl is the length of n written in
binary? Not quite. We also need six bits to write 32 = (100000)2. So [lgnl
is the wrong answer to the problem. (It fails only when n is a power of 2,
but that’s infinitely many failures.) We can find a correct answer by realizing
that it takes m bits to write each number n such that
2”-’
6 n < 2m; thus
&(a))
tells us that m
-
1 = LlgnJ, so m =
1lgn.J
+ 1. That is, we need
\lgnJ t 1 bits to express n in binary, for all n > 0. Alternatively, a similar
derivation yields the answer [lg(n t 1 )I; this formula holds for n = 0 as well,
if we’re willing to say that it takes zero bits to write n = 0 in binary.
Let’s look next at expressions with several floors or ceilings. What is
[lxJl?
Easy-smce
1x1
is an integer,
[lx]]
is just
1x1.
So is any other ex-
pression with an innermost
1x1
surrounded by any number of floors or ceilings.
Here’s a tougher problem: Prove or disprove the assertion
[JI;TII =
lJ;;I,
real x 3 0.
(3.9)
Equality obviously holds
wh.en
x is an integer, because x =
1x1.
And there’s
equality in the special cases
7c
= 3.14159. . . , e = 2.71828. . . , and
@
=
(1
+&)/2
=
1.61803...,
because we get 1 = 1. Our failure to find a
coun-
terexample suggests that equality holds in general, so let’s try to prove it.
Hmmm. We’d bet-
ter not write {x}
for the fractional
part when it could
be confused with
the set containing x
as its only element.
The second case
occurs if and only
if there’s a “carry”
at the position of
the decimal point,
when the fractional
parts {x} and {y}
are added together.
[Of course
7-c,
e,
and 4 are the
obvious first real
numbers to try,
aren’t they?)
3.2 FLOOR/CEILING APPLICATIONS 71
Skepticism is
healthy only to
a limited extent.
Being skeptical
about proofs and
programs (particu-
larly your own) will
probably keep your
grades healthy and
your job fairly se-
cure. But applying
that much skepti-
cism will probably
also keep you shut
away working all
the
time, instead
of letting you get
out
for
exercise and
relaxation.
Too
much skepti-
cism is an open in-
vitation
to the
state
of rigor mortis,
where you become
so worried about
being correct and
rigorous that you
never get anything
finished.
-A skeptic
(This observation
was made by R. J.
McEliece when he
was an undergrad.)
Incidentally, when we’re faced with a “prove or disprove,” we’re usually
better off trying first to disprove with a counterexample, for two reasons:
A disproof is potentially easier (we need just one counterexample); and nit-
picking arouses our creative juices.
Even if the given assertion is true, our
search for a counterexample often leads us to a proof, as soon as we see why
a counterexample is impossible. Besides, it’s healthy to be skeptical.
If we try to prove that
[m]
= L&J with the help of calculus, we might
start by decomposing x into its integer and fractional parts [xJ + {x} = n +
0
and then expanding the square root using the binomial theorem: (n+(3)‘/’ =
n’/2 +
n-‘/2(j/2
_
&/2@/g
+ . . . .
But this approach gets pretty messy.
It’s much easier to use the tools we’ve developed. Here’s a possible strat-
egy: Somehow strip off the outer floor and square root of
[ml,
then re-
move the inner floor, then add back the outer stuff to get
Lfi].
OK. We let
m=llmj
d
an
invoke (3.5(a)), giving m 6
m
< m + 1. That removes
the outer floor bracket without losing any information. Squaring, since all
three expressions are nonnegative, we have m2 6
Lx]
< (m + 1)‘. That gets
rid of the square root. Next we remove the floor, using (3.7(d)) for the left
inequality and (3.7(a)) for the right: m2 6 x < (m +
1)2.
It’s now a simple
matter to retrace our steps, taking square roots to get m 6
fi
< m + 1 and
invoking (3.5(a)) to get m =
[J;;].
Thus
\m]
= m =
l&J;
the assertion
is true. Similarly, we can prove that
[ml
=
[J;;]
,
real x 3 0.
The proof we just found doesn’t rely heavily on the properties of square
roots. A closer look shows that we can generalize the ideas and prove much
more: Let f(x) be any continuous, monotonically increasing function with the
property that
f(x) = integer
===3
x = integer.
(The symbol
‘==+I
means “implies!‘) Then we have
lf(x)J
=
lf(lxJ
11
and
If(x)1 = Tf(Txl)l,
(3.10)
whenever f(x),
f(lxJ),
and f(
[xl)
are defined. Let’s prove this general prop-
erty for ceilings, since we did floors earlier and since the proof for floors is
almost the same. If x =
[xl,
there’s nothing to prove. Otherwise x <
[xl,
and f(x) < f (
[xl
) since f is increasing. Hence
[f
(x)1
6
[f
(
[xl
)I,
since
11
is
nondecreasing. If
[f(x)]
<
[f(
[xl)],
there must be a number y such that
x
6~
<
[xl
and
f(y)
=
Tf(x)l,
since f is continuous. This y is an integer, be-
cause of f's special property. But there cannot be an integer strictly between
x and
[xl.
This contradiction implies that we must have
[f
(x)1
=
If
(
[xl
)I.
72 INTEGER FUNCTIONS
An important special case of this theorem is worth noting explicitly:
if m and n are integers and the denominator n is positive. For example, let
m = 0; we have
[l[x/lO]/lOJ
/lOI =
[x/1000].
Dividing thrice by 10 and
throwing off digits is the same as dividing by 1000 and tossing the remainder.
Let’s try now to prove or disprove another statement:
This works when x =
7~
and x = e, but it fails when x = 4; so we know that
it isn’t true in general.
Before going any further, let’s digress a minute to discuss different
“lev-
els” of questions that can be asked in books about mathematics:
Level 1. Given an explicit object x and an explicit property P(x), prove that
P(x) is true. For example, “Prove that
1x1
= 3.” Here the problem involves
finding a proof of some purported fact.
Level 2. Given an explicit set X and an explicit property P(x), prove that
P(x) is true for
all
x
E
X. For example, “Prove that
1x1
< x for all real x.”
Again the problem involves finding a proof, but the proof this time must be
general. We’re doing algebra, not just arithmetic.
Level 3. Given an explicit set X and an explicit property P(x), prove or
disprove that P(x) is true for all x
E
X. For example, “Prove or disprove
In my other texts
that
[ml
= [J;;] for all real x 2
0.”
Here there’s an additional level ~~se~~~~nr($
of uncertainty; the outcome might go either way. This is closer to the real
Same
as
~~~~~~~~~
situation a mathematician constantly faces: Assertions that get into books
about 99.44% df
tend to be true, but new things have to be looked at with a jaundiced eye. If
the
time;
but
not
the statement is false, our job is to find a counterexample. If the statement
in this book.
is true, we must find a proof as in level 2.
Level 4. Given an explicit set X and an explicit property P(x), find a neces-
sary and
suficient
condition Q(x) that P(x) is true. For example, “Find a
necessary and sufficient condition that
1x1
3
[xl
.”
The problem is to find Q
such that P(x)
M
Q(x). Of course, there’s always a trivial answer; we can
take Q(x) = P(x). But the implied requirement is to find a condition that’s as
simple as possible. Creativity is required to discover a simple condition that
But
no simpler.
will work. (For example, in this case, “lx] 3
[xl
H
x is an integer.“) The
-A. Einstein
extra element of discovery needed to find Q(x) makes this sort of problem
more difficult, but it’s more typical of what mathematicians must do in the
“real world!’ Finally, of course, a proof must be given that P(x) is true if and
only if Q(x) is true.
3.2 FLOOR/CEILING APPLICATIONS 73
Level 5. Given an explicit set X, find an interesting property P(x) of its
elements. Now we’re in the scary domain of pure research, where students
might think that total chaos reigns. This is real mathematics. Authors of
textbooks rarely dare to ask level 5 questions.
Home of the
Toledo
Mudhens.
End of digression. But let’s convert our last question from level 3 to
level 4: What is a necessary and sufficient condition that [JLT;Jl =
[fil?
We have observed that equality holds when
x
= 3.142 but not when x = 1.618;
further experimentation shows that it fails also when
x
is between 9 and 10.
Oho. Yes. We see that bad cases occur whenever m2 < x < m2 +
1,
since this
gives m on the left and m + 1 on the right. In all other cases where
J;;
is
defined, namely when x = 0 or m2 + 1 6 x 6 (m + 1
)2,
we get equality. The
following statement is therefore necessary and sufficient for equality: Either
x is an integer or
m
isn’t.
(Or, by pessimists,
half-closed.)
For our next problem let’s consider a handy new notation, suggested
by C. A. R. Hoare and Lyle Ramshaw, for intervals of the real line:
[01.
61
denotes the set of real numbers x such that
OL
< x 6
(3.
This set is called
a closed interval because it contains both endpoints
o(
and
(3.
The interval
containing neither endpoint, denoted by
(01.
,
(3),
consists of all x such that
(x
< x <
(3;
this is called an open interval. And the intervals
[a..
(3)
and
(a. .
(31,
which contain just one endpoint, are defined similarly and called
half- open.
How many integers are contained in such intervals? The half-open inter-
vals are easier, so we start with them. In fact half-open intervals are almost
always nicer than open or closed intervals. For example, they’re additive-we
can combine the half-open intervals
[K.
.
(3)
and
[(3
. .
y)
to form the half-open
interval [a. . y). This wouldn’t work with open intervals because the point
(3
would be excluded, and it could cause problems with closed intervals because
(3
would be included twice.
Back to our problem. The answer is easy if
01
and
(3
are integers: Then
[(x..(3) containsthe (?-olintegers
01,
o~+l, . . . .
S-1,
assuming that 016 6.
Similarly (
0~.
.
(31
contains
(3
-
01
integers in such a case. But our problem is
harder, because
01
and
(3
are arbitrary reals. We can convert it to the easier
problem, though, since
when n is an integer, according to (3.7). The intervals on the right have
integer endpoints and contain the same number of integers as those on the left,
which have real endpoints. So the interval
[oL..
b)
contains exactly [rjl
-
1~1
integers, and
(0~.
.
(31
contains
[(3]
-
La]. This is a case where we actually
want to introduce floor or ceiling brackets, instead of getting rid of them.
74 INTEGER FUNCTIONS
By the way, there’s a mnemonic for remembering which case uses floors
and which uses ceilings: Half-open intervals that include the left endpoint
but not the right (such as 0 <
8
< 1) are slightly more common than those
that include the right endpoint but not the left; and floors are slightly more
Just like we can
re-
common than ceilings. So by Murphy’s Law, the correct rule is the opposite
member
the date of
of what we’d expect -ceilings for
[OL
. . p) and floors for
(01.
.
01.
Columbus’s depar-
Similar analyses show that the closed interval
[o(.
. fi] contains exactly
t ure by singing,
“In
fourteen
hundred
Ll3J
-
[a] +1 integers and that the open interval (01..
@)
contains [fi]
-
LX]- 1;
but we place the additional restriction
a
#
fl
on the latter so that the formula
;o~u~~~-$;~;{~e
won’t ever embarrass us by claiming that an empty interval (a. . a) contains
deep
b,ue
sea
,,
a total of -1 integers. To summarize, we’ve deduced the following facts:
interval integers contained restrictions
[a..
81
1B.l
-
Toil+1
a6
B,
[a..
I31
Ml - bl
a6
B,
(3.12)
(a.. Bl
LPJ
-
14
a<
6,
(a..B)
TPl
- 14 -1
a<
p.
Now here’s a problem we can’t refuse. The Concrete Math Club has a
casino (open only to purchasers of this book) in which there’s a roulette wheel
with one thousand slots, numbered 1 to 1000. If the number n that comes up
on a spin is divisible by the floor of its cube root, that is, if
then it’s a winner and the house pays us $5; otherwise it’s a loser and we
must pay $1. (The notation a\b, read “a divides
b,”
means that b is an exact
multiple of a; Chapter 4 investigates this relation carefully.) Can we expect
to make money if we play this game?
We can compute the average winnings-that is, the amount we’ll win
(or lose) per play-by first counting the number W of winners and the num-
ber L = 1000
-
W of losers. If each number comes up once during 1000 plays,
we win 5W dollars and lose L dollars, so the average winnings will be
[A poll of the class
at
this point showed
that 28 students
thought it was a
bad idea to play,
13 wanted to gam-
ble, and the rest
were too confused
5w-L
5w-(looo-w) 6W-
1000
to answer.)
~
=
1000 ;ooo
=1000 .
(So we hit them
with the Concrete
If there are 167 or more winners, we have the advantage; otherwise the ad-
Math
aub.1
vantage is with the house.
How can we count the number of winners among 1 through 1 OOO? It’s
not hard to spot a pattern. The numbers from 1 through
23
-
1 = 7 are all
winners because
[fi]
= 1 for each. Among the numbers
23
= 8 through
33
-
1 = 26, only the even numbers are winners. And among
33
= 27 through
43
-
1 = 63, only those divisible by 3 are. And so on.
3.2 FLOOR/CEILING APPLICATIONS 75
The whole setup can be analyzed systematically if we use the summa-
tion techniques of Chapter 2, taking advantage of
Iverson’s
convention about
logical statements evaluating to 0 or 1:
1000
w
=
xr
n is a winner]
?I=1
=
x
[Lfij
\n] =
~[k=Lfi~][k\nl(l
6n610001
l<n61000
k,n
=
x
[k3$n<(k+1)3][n=km][l
6n<lOOO)
km,n
= 1
+~[k3<km<(k+l)3][l<k<10]
km
=
l+~[m~[k~..(k+1)~/k)][l~k<l0l
=
l+k’g
([k2+3k+3+l/kl-[k21)
l<k<lO
7+31
= 1+
x
(3k+4)
=
l+T.
9 = 172.
l<k<lO
nue.
Where did you say
this casino is?
This derivation merits careful study. Notice that line 6 uses our formula
(3.12) for the number of integers in a half-open interval. The only “difficult”
maneuver is the decision made between lines 3 and 4 to treat n = 1000 as a
special case. (The inequality
k3
6 n < (k + 1
)3
does not combine easily with
1 6 n < 1000 when k = 10.) In general, boundary conditions tend to be the
most critical part of x-manipulations.
The bottom line says that W = 172; hence our formula for average win-
nings per play reduces to (6.172
-
1000)/1000
dollars, which is 3.2 cents. We
can expect to be about $3.20 richer after making 100 bets of $1 each. (Of
course, the house may have made some numbers more equal than others.)
The casino problem we just solved is a dressed-up version of the more
mundane question, “How many integers n, where 1 6 n 6 1000, satisfy the re-
lation
LfiJ
\ n?” Mathematically the two questions are the same. But some-
times it’s a good idea to dress up a problem. We get to use more vocabulary
(like “winners” and “losers”), which helps us to understand what’s going on.
Let’s get general. Suppose we change 1000 to 1000000, or to an even
larger number, N . (We assume that the casino has connections and can get a
bigger wheel.) Now how many winners are there?
The same argument applies, but we need to deal more carefully with the
largest value of k, which we can call K for convenience:
76 INTEGER FUNCTIONS
(Previously K was 10.) The total number of winners for general N comes to
W =
x
(3k+4)
+x[K3<Km<N]
l<k<K
=
f(7+3K+l)(K~l)+~[mtlK2..N/K)]
m
=
$K2+sK-4+~[mE[K2..N/K]].
m
We know that the remaining sum is
LN/KJ
-
[K21
+ 1 = [N/K]
-
KZ
+ 1;
hence the formula
W =
LN/Kj+;K’+;K-3,
K =
[ml
(3.13)
gives the general answer for a wheel of size N.
The first two terms of this formula are approximately N2i3 + iN213 =
$N2j3, and the other terms are much smaller in comparison, when N is large.
In Chapter 9 we’ll learn how to derive expressions like
W = ;N2’3 +
O(N”3),
where O(N’j3) stands for a quantity that is no more than a constant times
N’13. Whatever the constant is, we know that it’s independent of N; so for
large N the contribution of the O-term to W will be quite small compared
with iN213. For example, the following table shows how close iN213 is to
W:
N
p/3
W % error
1,000 150.0
172
12.791
10,000
696.2
746
6.670
100,000
3231.7
3343
3.331
1,000,000 15000.0
15247 1.620
1 o,ooo,ooo
69623.8
70158
0.761
100,000,000
323165.2 324322
0.357
1,000,000,000
1500000.0
1502496
0.166
It’s a pretty good approximation.
Approximate formulas are useful because they’re simpler than formu-
las with floors and ceilings. However, the exact truth is often important,
too, especially for the smaller values of N that tend to occur in practice.
For example, the casino owner may have falsely assumed that there are only
$N2j3 = 150 winners when N = 1000 (in which case there would be a
lO#
advantage for the house).
. . . without MS
of generality. .
“If x be
an in-
commensurable
number less than
unity, one of the
series of quantities
m/x, m/(1
-x),
where m is a whole
number, can be
found which shall
he between any
given consecutive
integers, and but
one such quantity
can be found.”
-
Rayleigh
[245]
Right, because
exact/y one of
the counts must
increase when n
increases
by
1 .
3.2 FLOOR/CEILING APPLICATIONS 77
Our last application in this section looks at so-called spectra. We define
the spectrum of a real number a to be an infinite multiset of integers,
Sped4
= 114,
12a1,
13a1, . . .I.
(A multiset is like a set but it can have repeated elements.) For example, the
spectrum of
l/2
starts out (0,
1,
1,2,2,3,3,.
.
.}.
It’s easy to prove that no two spectra are equal-that a #
(3
implies
Spec(a) # Spec((3). For, assuming without loss of generality that a <
(3,
there’s a positive integer m such that m(
l3
-
a) 3
1.
(In fact, any m 3
[l/(
(3
-
a)] will do; but we needn’t show off our knowledge of floors and
ceilings all the time.) Hence ml3
-
ma 3 1, and LrnSl > [ma]. Thus
Spec((3) has fewer than m elements <
lrnaj,
while Spec(a) has at least m.
Spectra have many beautiful properties. For example, consider the two
multisets
Spec(&) =
{1,2,4,5,7,8,9,11,12,14,15,16,18,19,21,22,24
,...
},
Spec(2+fi)
=
{3,6,10,13,17,20,23,27,30,34,37,40,44,47,51,...
}.
It’s easy to calculate Spec(
fi
) with a pocket calculator, and the nth element
of Spec(2+
fi)
is just 2n more than the nth element of Spec(fi), by (3.6).
A closer look shows that these two spectra are also related in a much more
surprising way: It seems that any number missing from one is in the other,
but that no number is in both! And it’s true: The positive integers are the
disjoint union of Spec(
fi
) and Spec(2+
fi
).
We say that these spectra form
a partition of the positive integers.
To prove this assertion, we will count how many of the elements of
Spec(&!) are 6 n, and how many of the elements of
Spec(2+fi)
are 6 n. If
the total is n, for each n, these two spectra do indeed partition the integers.
Let a be positive. The number of elements in Spec(a) that are < n is
N(a,n) =
x[lkaJ
<n]
k>O
=
x[[kaj
<n+
l]
k>O
=
tr
ka<n+
11
k>O
=
x[O<k<(n+l)/a]
=
[;n+l)/a]
-1.
(3.14)
78 INTEGER FUNCTIONS
This derivation has two special points of interest. First, it uses the law
m<n
-e+
m<n+l,
integers m and n
(3.15)
to change ‘<’ to I<‘, so that the floor brackets can be removed by (3.7).
Also -and this is more subtle -it sums over the range k > 0 instead of k 3
1,
because (n + 1 )/a might be less than 1 for certain n and a. If we had tried
to apply (3.12) to determine the number of integers in
[l
. .
(n+
1)/a), rather
than the number of integers in (0.. (n+ 1)/a), we would have gotten the right
answer; but our derivation would have been faulty because the conditions of
applicability wouldn’t have been met.
Good, we have a formula for N (a, n). Now we can test whether or not
Spec(
fi
) and Spec(Z+
fi
) partition the positive integers, by testing whether
or not
N(fi,
n) + N(2 +
fi,
n) = n for all integers n > 0, using (3.14):
by
(3.2);
n+l
~-
+2+JZ
by
(3.3).
Everything simplifies now because of the neat identity
1,
Jz
i&=l;
our condition reduces to testing whether or not
{T}+(S)
= 1,
for all n > 0. And we win, because these are the fractional parts of
noninteger numbers that add up to the integer n + 1. A partition it is.
3.3
FLOOR/CEILING RECURRENCES
two
Floors and ceilings add an interesting new dimension to the study
of recurrence relations. Let’s look first at the recurrence
K0
= 1;
k-+1
=
1
+
min(2K~,/2l,3K~,/3~),
for n 3 0.
(3.16)
Thus, for example,
K1
is 1 + min(2Ko,3Ko) = 3; the sequence begins 1, 3, 3,
4, 7, 7, 7, 9, 9, 10, 13, . . . .
One of the authors of this book has modestly
decided to call these the Knuth numbers.
3.3 FLOOR/CEILING RECURRENCES 79
Exercise 25 asks for a proof or disproof that K, > n, for all n 3 0. The
first few K’s just listed do satisfy the inequality, so there’s a good chance that
it’s true in general. Let’s try an induction proof: The basis n = 0 comes
directly from the defining recurrence. For the induction step, we assume
that the inequality holds for all values up through some fixed nonnegative n,
and we try to show that K,+l > n + 1. From the recurrence we know that
K
n+l
=
1
+
minWl,pJ
,3Kln/31
1.
The induction hypothesis tells us that
2K
L,,/~J
3 2Ln/2J and
3Kln/3~
3 3 [n/31. However, 2[n/2J can be as small
as n
-
1,
and 3 Ln/3J can be as small as n
-
2. The most we can conclude
from our induction hypothesis is that
Kn+l
> 1 + (n
-
2); this falls far short
of K,+l 3 n + 1.
We now have reason to worry about the truth of K, 3 n, so let’s try to
disprove it. If we can find an n such that either
2Kl,,zl
< n or 3Kl,,31 < n,
or in other words such that
we will have K,+j < n + 1. Can this be possible? We’d better not give the
answer away here, because that will spoil exercise 25.
Recurrence relations involving floors and/or ceilings arise often in com-
puter science, because algorithms based on the important technique of “divide
and conquer” often reduce a problem of size n to the solution of similar prob-
lems of integer sizes that are fractions of n. For example, one way to sort
n records, if n >
1,
is to divide them into two approximately equal parts, one
of size [n/21 and the other of size Ln/2]. (Notice, incidentally, that
n = [n/21 + Ln/2J
;
(3.17)
this formula comes in handy rather often.) After each part has been sorted
separately (by the same method, applied recursively), we can merge the
records into their final order by doing at most n
-
1 further comparisons.
Therefore the total number of comparisons performed is at most f(n), where
f(1) = 0;
f(n)=f([n/21)+f([n/2J)+n-1,
for n > 1
(3.18)
A solution to this recurrence appears in exercise 34.
The
Josephus
problem of Chapter 1 has a similar recurrence, which can
be cast in the form
J(1) = 1;
J(n) = 2J( LnI2J)
-
(-1)” ,
for n > 1.
80 INTEGER FUNCTIONS
We’ve got more tools to work with than we had in Chapter 1, so let’s
consider the more authentic
Josephus
problem in which every third person is
eliminated, instead of every second. If we apply the methods that worked in
Chapter 1 to this more difficult problem, we wind up with a recurrence like
J3(n)
= [iJ3(Ljnl) + a,]
modn+
1,
where ‘mod’ is a function that we will be studying shortly, and where we have
a,, = -2, +1 , or
-i
according as n mod 3 = 0,
1,
or 2. But this recurrence
is too horrible to pursue.
There’s another approach to the
Josephus
problem that gives a much
better setup. Whenever a person is passed over, we can assign a new number.
Thus, 1 and 2 become n + 1 and n + 2, then 3 is executed; 4 and 5 become
n
+
3 and n
+
4, then 6 is executed; . . .
;
3kSl
and
3k+2
become
n+2k+
1
and n + 2k + 2, then 3k + 3 is executed; . . . then 3n is executed (or left to
survive). For example, when n = 10 the numbers are
12345678
9 10
11 12 13 14 15 16
17
18
19 20
21
22
23 24
25
26 27
28
29
30
The kth person eliminated ends up with number 3k. So we can figure out who
the survivor is if we can figure out the original number of person number 3n.
If N > n, person number N must have had a previous number, and we
can find it as follows: We have N = n + 2k + 1 or N = n + 2k + 2, hence
k = [(N
-
n
-
1)/2J
;
the previous number was 3k +
1
or 3k + 2, respectively.
That is, it was 3k + (N
-
n
-
2k) = k + N
-
n. Hence we can calculate the
survivor’s number
J3
(n) as follows:
N := 3n;
while
N>n
do N:=
[“-r-‘]
+N-n;
J3(n)
:= N.
This is not a closed form for Jj(n); it’s not even a recurrence. But at least it
“Not too slow,
not
too
fast,”
tells us how to calculate the answer reasonably fast, if n is large. -L.
Amstrong
“Known” like, say,
harmonic numbers.
A. M.
Odlyzko and
H. S. Wilf have
shown that
D:’ =
[(
$)“Cj ,
where
C
M
1.622270503.
3.3 FLOOR/CEILING RECURRENCES 81
Fortunately there’s a way to simplify this algorithm if we use the variable
D = 3n + 1
-
N in place of N. (This change in notation corresponds to
assigning numbers from 3n down to
1,
instead of from 1 up to 3n; it’s sort of
like a countdown.) Then the complicated assignment to N becomes
D :=
3n+l-
(3n+1-D)-n-1
+(3n+1-D)-n
and we can rewrite the algorithm as follows:
D := 1;
while D < 2n do D :=
[;Dl
;
Js(n) :=
3n+l
-D.
Aha!
This looks much nicer, because n enters the calculation in a very simple
way. In fact, we can show by the same reasoning that the survivor
J4
(n) when
every qth person is eliminated can be calculated as follows:
D := 1;
while D
<
(q
-
1)n do D :=
[*Dl
;
J,(n) :=
qn+l
-D.
(3.19)
In the case q = 2 that we know so well, this makes D grow to
2m+1
when
n==2”+1;
hence
Jz(n)=2(2m+1)+1
-2m+1
=21+1.
Good.
The recipe in (3.19) computes a sequence of integers that can be defined
by the following recurrence:
D(q)
=
1
0
1
D’4’
=
n
L,,(q)
1
n-1
1
for n >
0.
q
-
(3.20)
These numbers don’t seem to relate to any familiar functions in a simple
way, except when q = 2; hence they probably don’t have a nice closed form.
But if we’re willing to accept the sequence
D$’
as “known,” then it’s easy to
describe the solution to the generalized
Josephus
problem: The survivor
Js
(n)
is
qn+
1
-Dp’,
where k is as small as possible such that
D:’
> (q
-
1)n.
3.4 ‘MOD’: THE BINARY OPERATION
The quotient of n divided by m is Ln/m] , when m and n are positive
integers. It’s handy to have a simple notation also for the remainder of this
82 INTEGER FUNCTIONS
division, and we call it ‘n mod m’. The basic formula
n =
mLn/mJ
+ nmodm
-
-
quotient
remainder
tells us that we can express n mod m as n
-
mln/mJ
. We can generalize this
to negative integers, and in fact to arbitrary real numbers:
xmody = x
-
yLx/yJ,
for y # 0.
(3.21)
This defines ‘mod’ as a binary operation, just as addition and subtraction are
binary operations. Mathematicians have used mod this way informally for a
long time, taking various quantities mod 10, mod
277,
and so on, but only in
the last twenty years has it caught on formally. Old notion, new notation.
We can easily grasp the intuitive meaning of x mod y, when x and y
are positive real numbers, if we imagine a circle of circumference y whose
points have been assigned real numbers in the interval
[O
. . y). If we travel a
distance x around the circle, starting at 0, we end up at x mod y. (And the
number of times we encounter 0 as we go is [x/y] .)
When x or y is negative, we need to look at the definition carefully in
order to see exactly what it means. Here are some integer-valued examples:
5mod3
=
5-3[5/3]
= 2;
5 mod -3
= 5
-
(-3)15/(-3)]
=
-1
;
-5 mod 3 = -5
-
3L-5/3]
= 1;
-5 mod -3
= -5
-
(-3)
l--5/(-3)]
= -2.
Why do they call it
‘mod’: The Binary
Operation? Stay
tuned to find out in
the next, exciting,
chapter!
Beware of computer
languages that
use
another definition.
The number after ‘mod’ is called the modulus; nobody has yet decided what
How about calling
to call the number before ‘mod’. In applications, the modulus is usually
positive, but the definition makes perfect sense when the modulus is negative.
:tz ~~~u~o~~
In both cases the value of x mod y is between 0 and the modulus:
0 < xmody < y, for y > 0;
0 2 xmody > y, for y < 0.
What about y = O? Definition (3.21) leaves this case undefined, in order to
avoid division by zero, but to be complete we can
define
xmod0
= x.
(3.22)
This convention preserves the property that x mod y always differs from x by
a multiple of y. (It might seem more natural to make the function continuous
at 0, by defining x mod 0 =
lim,,o
x mod y = 0. But we’ll see in Chapter 4
There was a time in
the 70s when ‘mod’
was the fashion.
Maybe the new
mumble function
should be called
‘punk’?
No-l
&
‘mumble’.
The remainder, eh?
3.4 ‘MOD’: THE BINARY OPERATION 83
that this would be much less useful. Continuity is not an important aspect
of the mod operation.)
We’ve already seen one special case of mod in disguise, when we wrote x
in terms of its integer and fractional parts, x =
1x1
+ {x}. The fractional part
can also be written x mod 1, because we have
x =
lxj
+ x mod 1 .
Notice that parentheses aren’t needed in this formula; we take mod to bind
more tightly than addition or subtraction.
The floor function has been used to define mod, and the ceiling function
hasn’t gotten equal time. We could perhaps use the ceiling to define a mod
analog like
xmumbley =
y[x/yl
-x;
in our circle analogy this represents the distance the traveler needs to continue,
after going a distance x, to get back to the starting point 0. But of course
we’d need a better name than ‘mumble’. If sufficient applications come along,
an appropriate name will probably suggest itself.
The distributive law is mod’s most important algebraic property: We
have
c(x mod y) = (cx) mod (cy)
(3.23)
for all real c, x, and y. (Those who like mod to bind less tightly than multi-
plication may remove the parentheses from the right side here, too.) It’s easy
to prove this law from definition (3.21), since
c(x mod y ) = c(x
-
y [x/y] ) = cx
-
cy [cx/cy] = cx mod cy ,
if cy # 0; and the zero-modulus cases are trivially true. Our four examples
using
f5
and
f3
illustrate this law twice, with c = -1. An identity like
(3.23) is reassuring, because it gives us reason to believe that ‘mod’ has not
been defined improperly.
In the remainder of this section, we’ll consider an application in which
‘mod’ turns out to be helpful although it doesn’t play a central role. The
problem arises frequently in a variety of situations: We want to partition
n things into m groups as equally as possible.
Suppose, for example, that we have n short lines of text that we’d like
to arrange in m columns. For aesthetic reasons, we want the columns to be
arranged in decreasing order of length (actually nonincreasing order); and the
lengths should be approximately the same-no two columns should differ by
84 INTEGER FUNCTIONS
more than one line’s worth of text. If 37 lines of text are being divided into
five columns, we would therefore prefer the arrangement on the right:
8 8
8 5 8 8 7 7 7
Furthermore we want to distribute the lines of text columnwise-first decid-
ing how many lines go into the first column and then moving on to the second,
the third, and so on-because that’s the way people read. Distributing row
by row would give us the correct number of lines in each column, but the
ordering would be wrong. (We would get something like the arrangement on
the right, but column 1 would contain lines 1, 6, 11, . . . , 36, instead of lines
1, 2, 3, . .
'
)
8
as desired.)
A row-by-row distribution strategy can’t be used, but it does tell us how
many lines to put in each column. If n is not a multiple of m, the
row-
by-row procedure makes it clear that the long columns should each contain
[n/ml lines, and the short columns should each contain
Ln/mJ.
There will
be exactly n mod m long columns (and, as it turns out, there will be exactly
n mumble m short ones).
Let’s generalize the terminology and talk about ‘things’ and ‘groups’
instead of ‘lines’ and ‘columns’.
We have just decided that the first group
should contain [n/ml things; therefore the following sequential distribution
scheme ought to work: To distribute n things into m groups, when m > 0,
put [n/ml things into one group, then use the same procedure recursively to
put the remaining n’ = n- [n/ml things into m’ = m- 1 additional groups.
For example, if n = 314 and m = 6, the distribution goes like this:
remaining things remaining groups [things/groups]
314 6 53
261
5
53
208 4 52
156
3
52
104 2 52
52
1
52
It works. We get groups of approximately the same size, even though the
divisor keeps changing.
Why does it work? In general we can suppose that n = qm +
r,
where
q =
Ln/mJ
and r = n mod m. The process is simple if r = 0: We put
[n/ml = q things into the first group and replace n by n’ = n
-
q, leaving
3.4 ‘MOD’: THE BINARY OPERATION 85
n’ = qm’ things to put into the remaining m’ = m
-
1 groups. And if
r
> 0, we put
[n/ml
= q + 1 things into the first group and replace n
by n’ = n
-
q
-
1,
leaving n’ = qm’ +
T
-
1 things for subsequent groups.
The new remainder is r’ =
r
-
1,
but q stays the same. It follows that there
will be
r
groups with q + 1 things, followed by m
-
r
groups with q things.
How many things are in the kth group? We’d like a formula that gives
[n/ml
when k < n mod m, and Ln/m] otherwise. It’s not hard to verify
that
has the desired properties, because this reduces to q + [(r
-
k + 1 )/ml if we
write n = qm +
r
as in the preceding paragraph; here q =
[n/m].
We have
[(r-k+
1)/m]
=
[k<r],
if 1 6 k 6 m and 0 6
r
< m. Therefore we can
write an identity that expresses the partition of n into m as-equal-as-possible
parts in nonincreasing order:
This identity is valid for all positive integers m, and for all integers n (whether
positive, negative, or zero). We have already encountered the case m = 2 in
(3.17), although we wrote it in a slightly different form, n = [n/21 +
[n/2].
If we had wanted the parts to be in nondecreasing order, with the small
groups coming before the larger ones, we could have proceeded in the same
way but with
[n/mJ
things in the first group. Then we would have derived
the corresponding identity
(3.25)
It’s possible to convert between (3.25) and (3.24) by using either (3.4) or the
identity of exercise 12.
Some c/aim that it’s
Now if we replace n in (3.25) by Lrnx] , and apply rule (3.11) to remove
too dangerous to
replace anything by
floors inside of floors, we get an identity that holds for all real x:
an mx.
LmxJ =
1x1
+
lx
m]
.
..+
lx+&J]
.
+ -!- +
(3.26)
This is rather amazing, because the floor function is an integer approximation
of a real value, but the single approximation on the left equals the sum of a
bunch of them on the right. If we assume that
1x1
is roughly x
-
4
on the
average, the left-hand side is roughly mx
-
5, while the right-hand side comes
toroughly
(x--)+(x-it-l-)+...
+(x-i+%)
=mx-it;
thesumof
all these rough approximations turns out to be exact!
86 INTEGER FUNCTIONS
3.5 FLOOR/CEILING SUMS
Equation (3.26) demonstrates that it’s possible to get a closed form
for at least one kind of sum that involves
1
J. Are there others? Yes. The
trick that usually works in such cases is to get rid of the floor or ceiling by
introducing a new variable.
For example, let’s see if it’s possible to do the sum
in closed form. One idea is to introduce the variable m = L&J; we can do
this “mechanically” by proceeding as we did in the roulette problem:
x
l&J
=
t
m[k<nl[m=lfil]
O<k<n k,m>O
=
x
m[k<nl[m<fi<m+l
k.m>O
=
x
m[k<nl[m2<k<(m+1
)‘I
=
r
m[m2<k<(m+1)2<n]
+
2
m[mLSk<n<(m+1)2]
k,m>O
Once again the boundary conditions are a bit delicate. Let’s assume first that
n =
a2
is a perfect square. Then the second sum is zero, and the first can be
evaluated by our usual routine:
k,m>O
=
tm((m+l)‘-m2)[m+16al
ll@O
=
~m(2m+l)[m<al
Ill20
=
x
(2mZ+3ml)[m<
a]
ll@O
=
x,”
(2mL +
3ml)
6m
=
$a(a-l)(a-2)+$a(a-1)
=
;(4a+l)a(a-1).
Falling powers
make the sum come
tumbling down.
Warning: This stuff
is fairly advanced.
Better skim the
next two pages on
first reading; they
aren't crucial.
-Friendly TA
Start
Skimming
3.5 FLOOR/CEILING SUMS 87
In the general case we can let a =
Lfij;
then we merely need to add
the terms for
a2
< k < n, which are all equal to a, so they sum to (n
-
a2)a.
This gives the desired closed form,
x
lJi;J
=
na-ia3-ia2-ia,
a=
[J;;J.
(3.27)
O<k<n
Another approach to such sums is to replace an expression of the form
1x1
by
,‘Yj
[l $ j 6 xl; this is legal whenever x 3 0. Here’s how that method
works in the sum of [square rodts], if we assume for convenience that n = a2:
x
l&j
=
~[1<j~&l[06k<a21
O<k<n
=
‘5
~[j2<k<a2]
l<j<a
k
=
x
(a’-j2)
=
a3
-
fa(a+
:)(a+
1).
l<j<a
Now here’s another example where a change of variable leads to a trans-
formed sum. A remarkable theorem was discovered independently by three
mathematicians- Bohl
[28],
Sierpiliski
[265],
and Weyl
[300]
-at about the
same time in 1909: If
LX
is irrational then the fractional parts {na} are very uni-
formly distributed between 0 and
1,
as n
+
00. One way to state this is that
)im;
x
f({ka}) = 1; f(x) dx
O<k<n
(3.28)
for all irrational
OL
and all functions f that are continuous almost everywhere.
For example, the average value of
{TUX}
can be found by setting f(x) = x; we
get
i.
(That’s exactly what we might expect; but it’s nice to know that it is
really, provably true, no matter how irrational
01
is.)
The theorem of Bohl, Sierpifiski, and Weyl is proved by approximating
f(x) above and below by “step functions,’ which are linear combinations of
the simple functions
f"(X) =
[06x<vl
when 0 < v 6 1. Our purpose here is not to prove the theorem; that’s a job
for calculus books. But let’s try to figure out the basic reason why it holds,
by seeing how well it works in the special case f(x) = f,,(x). In other words,
let’s try to see how close the sum
O<k<n
gets to the “ideal” value nv, when n is large and
01
is irrational.
88 INTEGER FUNCTIONS
For this purpose we define the discrepancy D(ol,n) to be the maximum
absolute value, over all 0 6 v <
1,
of the sum
s(a,n,v) =
x
([{ka}<v]
-v).
O<k<n
(3.29)
Our goal is to show that D(
LX,
n) is “not too large” when compared with n,
by showing that
Is(a,
n,v)l is always reasonably small.
First we can rewrite s(a, n,v) in simpler form, then introduce a new
index variable j:
x
([{ka}<v] -v) =
t
([ka]
-[klx-VI-v)
O<k<n O<k<n
=
-nv+
x
ELka--vvjjka]
O<k<n j
= -nv+
1
t
[jaP’<k<(j+v)a-‘1.
O<j<rna]
kin
If we’re lucky, we can do the sum on k. But we ought to introduce some
new variables, so that the formula won’t be such a mess. Without loss of
generality, we can assume that 0 < a < 1; let us write
Right, name and
conquer.
a = ~ap’J , a-’ =
a+a’;
The change of
vari-
able from k to j is
b = [va-‘l ,
va-’ = b -v’.
Thus a’ = {a--‘} is the fractional part of a-‘, and v’ is the mumble-fractional
part of
va-‘.
Once again the boundary conditions are our only source of grief. For
now, let’s forget the restriction ‘k < n’ and evaluate the sum on k without it:
t
[kc
[ja-’
..(j+v)a-‘)I
=
I(
j
+ v)(a + a’)]
-
[j(a + a’)]
k
= b+
[ja’-v’l
-
[ja’l.
OK, that’s pretty simple; we plug it in and plug away:
s(a,n,v) = -nv+
1nalb-t
t
([ja’-v’l
-
[ja’l)
-S,
(3.30)
O<j<[nal
the main point.
-
Friendly TA
where S is a correction for the cases with k 3 n that we have failed to exclude.
The quantity ja’ will never be an integer, since a (hence a’) is irrational; and
ja’
-v’
will be an integer for at most one value of j. So we can change the
3.5 FLOOR/CEILING SUMS 89
ceiling terms to floors:
(The formula
[O or 1
I
stands
for something that’s
either 0 or 1
;
we
needn’t commit
ourselves, because
the details don’t
really matter.)
1
E2ming
s(oI,n,v) =
-nv+[noilb-
x
(Lja’J-LjoL’-v’J)-S+[Oor
11.
O<j<
[nal
Interesting. Instead of a closed form, we’re getting a sum that looks rather
like
s(oI,
n, v) but with different parameters:
LX’
instead of
K, [no;]
instead
of n, and v’ instead of v. So we’ll have a recurrence for
s(
01,
n,v),
which
(hopefully) will lead to a recurrence for the discrepancy D
(01,
n). This means
we want to get
s(oI’,
[noil,v’) =
x
(lja’j
-
ljcx-v’j
-v’)
O<ji[nal
into the act:
s(oL,n,v) = -nv+
[nalb-
[nOiJv’-s(a’,[nOil,v’)-S+[Oor
11.
Recalling that b
-v’
=
VK’
, we see that everything will simplify beautifully
if we replace [na] (b
-
v’) by nol(b
-v’)
= nv:
s(ol,n,v) = -S(K), [nO(l,v’) -S +
c
+
[O
or 11.
Here
e
is a positive error of at most
VOL-‘.
Exercise 18 proves that S is,
likewise, between 0 and 01-l. We can also remove the term for j = [n&l
-
1 =
[n.K]
from the sum, since it contributes either v’ or v’
-
1. Hence, if we take
the maximum of absolute values over all v, we get
D(ol,n) < D(oI’, [KnJ) +
0~~’
$2.
(3.31)
The methods we’ll learn in succeeding chapters will allow us to conclude
from this recurrence that D(ol,n) is always much smaller than n, when n is
sufficiently large. Hence the theorem (3.28) is not only true, it can also be
strengthened: Convergence to the limit is very fast.
Whew; that was quite an exercise in manipulation of sums, floors, and
ceilings. Readers who are not accustomed to “proving that errors are small”
might find it hard to believe that anybody would have the courage to keep
going, when faced with such weird-looking sums. But actually, a second look
shows that there’s a simple motivating thread running through the whole
calculation. The main idea is that a certain sum
s(01,
n,v) of n terms can be
reduced to a similar sum of at most
oLn
terms. Everything else cancels out
except for a small residual left over from terms near the boundaries.
Let’s take a deep breath now and do one more sum, which is not trivial
but has the great advantage (compared with what we’ve just been doing) that
90 INTEGER FUNCTIONS
it comes out in closed form so that we can easily check the answer. Our goal
now will be to generalize the sum in (3.26) by finding an expression for
Is this a harder
sur’n
of floors, or a sum
of harder floors?
integer m > 0, integer n.
Finding a closed form for this sum is tougher than what we’ve done so far
(except perhaps for the discrepancy problem we just looked at). But it’s
Be forewarned: This
instructive, so we’ll hack away at it for the rest of this chapter.
is the beginning of
As usual, especially with tough problems, we start by looking at small
a pattern, in that
cases. The special case n = 1 is (3.26), with x replaced by x/m:
the
last
part
of
the
chapter consists
of ihe solution of
And as in Chapter 1, we find it useful
downwards to the case n = 0:
=
LXJ
.
some long, difficult
problem, with little
more motivation
to get more data by generalizing
than curiosity.
-Students
Touch&
But c’mon,
gang, do you always
need to be to/d
about applications
before you can get
interested in some-
thing? This sum
arises, for example,
in the study of
random number
generation and
testing. But math-
ematicians looked
at it long before
computers came
along, because they
found it natural to
ask if there’s a way
to sum arithmetic
progressions that
have been “floored.”
Our problem has two parameters, m and n; let’s look at some small cases
for m. When m = 1 there’s just a single term in the sum and its value is
1x1.
When m = 2 the sum is 1x/2] + [(x +
n)/2J.
We can remove the interaction
between x and n by removing n from inside the floor function, but to do that
we must consider even and odd n separately. If n is even, n/2 is an integer,
so we can remove it from the floor:
If n is odd, (n
-
1)/2 is an integer so we get
The last step follows from (3.26) with m = 2.
These formulas for even and odd n slightly resemble those for n = 0 and
1,
but no clear pattern has emerged yet; so we had better continue exploring
some more small cases. For m = 3 the sum is
-Your instructor
and we consider three cases for n: Either it’s a multiple of 3, or it’s 1 more
than a multiple, or it’s 2 more. That is, n mod 3 = 0,
1,
or 2. If n mod 3 = 0
3.5 FLOOR/CEILING SUMS 91
then n/3 and 2n/3 are integers, so the sum is
“inventive genius
requires pleasurable
mental activity as
a condition for its
vigorous exercise.
‘Necessity is the
mother of invention’
is a silly proverb.
‘Necessity is the
mother of futile
dodges’is
much
nearer to the truth.
The basis of the
growth of modern
invention is science,
and science is al-
most wholly the
outgrowth of plea-
surable intellectual
curiosity.”
-A. N.
White-
head [303]
If n mod 3 = 1 then (n
-
1)/3 and (2n
-
2)/3 are integers, so we have
Again this last step follows from (3.26), this time with m = 3. And finally, if
n mod 3 = 2 then
The left hemispheres of our brains have finished the case m = 3, but the
right hemispheres still can’t recognize the pattern, so we proceed to m = 4:
At least we know enough by now to consider cases based on n mod m. If
n mod 4 = 0 then
Andifnmod4=1,
The case n mod 4 = 3 turns out to give the same answer. Finally, in the case
n mod 4 = 2 we get something a bit different, and this turns out to be an
important clue to the behavior in general:
This last step simplifies something of the form [y/2] + [(y + 1)/2J, which
again is a special case of (3.26).
92 INTEGER FUNCTIONS
To summarize, here’s the value of our sum for small m:
ml
nmodm=O
nmodm=l
nmodm=2
nmodm=3
3
3[:]+n
1x1
+ n
-
1
LxJ
+ n
-
1
It looks as if we’re getting something of the form
where a, b, and c somehow depend on m and n. Even the myopic among
us can see that b is probably (m
-
1)/2.
It’s harder to discern an expression
for a; but the case n mod 4 = 2 gives us a hint that a is probably gcd(m, n),
the greatest common divisor of m and n. This makes sense because gcd(m, n)
is the factor we remove from m and n when reducing the fraction n/m to
lowest terms, and our sum involves the fraction n/m. (We’ll look carefully
at gcd operations in Chapter 4.) The value of c seems more mysterious, but
perhaps it will drop out of our proofs for a and b.
In computing the sum for small m, we’ve effectively rewritten each term
of the sum as
because (kn
-
kn mod m)/m is an integer that can be removed from inside
the floor brackets. Thus the original sum can be expanded into the following
tableau:
+
X
1
1
+
0
-
Omodm
-
m m m
+
z
-
nmodm
m
2n 2n mod m
+
m
-
m
+
x+(m-1)nmodm
+
(m-lb
(m-l)nmodm
m m m
3.5 FLOOR/CEILING SUMS 93
When we experimented with small values of m, these three columns led re-
spectively to a[x/aJ, bn, and c.
In particular, we can see how b arises. The second column is an arithmetic
progression, whose sum we know-it’s the average of the first and last terms,
times the number of terms:
;o+
m
(
(m- 1)n
1
.m =
(m-lb
2
So our guess that b = (m
-
1)/2 has been verified.
The first and third columns seem tougher; to determine a and c we must
take a closer look at the sequence ofnumbers
Omodm, nmodm,
2nmodm,
. . . . (m-1)nmodm.
Lemmanow,
dilemma
later.
Suppose, for example, that m = 12 and n = 5. If we think of the
sequence as times on a clock, the numbers are 0 o’clock (we take 12 o’clock
to be 0 o’clock), then 5 o’clock, 10 o’clock, 3 o’clock (= 15 o’clock), 8 o’clock,
and so on. It turns out that we hit every hour exactly once.
Now suppose m = 12 and n = 8. The numbers are 0 o’clock, 8 o’clock,
4 o’clock (= 16 o’clock), but then 0, 8, and 4 repeat. Since both 8 and 12 are
multiples of 4, and since the numbers start at 0 (also a multiple of
4),
there’s
no way to break out of this pattern-they must all be multiples of 4.
In these two cases we have gcd( 12,5) = 1 and gcd( 12,8) = 4. The general
rule, which we will prove next chapter, states that if d = gcd(m,n) then we
get the numbers 0, d,
2d,
. . . ,
m
-
d in some order, followed by d
-
1 more
copies of the same sequence. For example, with m = 12 and n = 8 the pattern
0, 8, 4 occurs four times.
The first column of our sum now makes complete sense. It contains
d copies of the terms [x/m],
1(x
+ d)/mJ, . . . ,
1(x
+ m
-
d)/m], in some
order, so its sum is
This last step is yet another application of (3.26). Our guess for a has been
verified:
a = d = gcd(m, n)
You know you’re
in college when the
book doesn’t tell
you how
to pro-
nounce
‘Dirichlet’.
3 EXERCISES 95
Exercises
Warmups
1
When we analyzed the
Josephus
problem in Chapter 1, we represented
an arbitrary positive integer n in the form n =
2m
+
1,
where 0 <
1
< 2”.
Give explicit formulas for
1
and m as functions of n, using floor and/or
ceiling brackets.
2
What is a formula for the nearest integer to a given real number x? In case
of ties, when x is exactly halfway between two integers, give an expression
that rounds (a) up-that is, to [xl; (b) down-that is, to
Lx].
3
Evaluate
1
\m&]n/a] , h
w
en m and n are positive integers and a is an
irrational number greater than n.
4
The text describes problems at levels 1 through 5. What is a level 0
problem? (This, by the way, is not a level 0 problem.)
5
Find a necessary and sufficient condition that
LnxJ
= n[xJ , when n is a
positive integer. (Your condition should involve {x}.)
6
Can something interesting be said about
Lf(x)J
when f(x) is a continuous,
monotonically decreasing function that takes integer values only when
x is an integer?
‘7
Solve the recurrence
X, = n, for 0 6 n < m;
x, =
x,-,+1,
for n 3 m.
8
Prove the Dirichlet box principle: If n objects are put into m boxes,
some box must contain 3 [n/ml objects, and some box must contain
6
lnhl.
9
Egyptian mathematicians in 1800 B.C. represented rational numbers be-
tween 0 and 1 as sums of unit fractions 1
/xl
+
. . . + 1
/xk,
where the x’s
were distinct positive integers. For example, they wrote
$
+
&,
instead
of
5.
Prove that it is always possible to do this in a systematic way: If
O<m/n<l,then
m 1
m
-=-
+
1
representation
of
-
-
n 4
n
1
1
4’
q=
1
z.
1
(This is Fibonacci’s algorithm, due to Leonardo Fibonacci, A.D. 1202.)
96 INTEGER FUNCTIONS
Basics
10 Show that the expression
is always either
1x1
or [xl. In what circumstances does each case arise?
11 Give details of the proof alluded to in the text, that the open interval
(a..
(3)
contains exactly
[(31
-
[a]
-
1 integers when a <
l3.
Why does
the case a =
(3
have to be excluded in order to make the proof correct?
12 Prove that
n
H
L
n+m-1
-
=
m
J
m
for all integers n and all positive integers m. [This identity gives us
another way to convert ceilings to floors and vice versa, instead of using
the reflective law
(3.4).]
13
Let a and
fi
be positive real numbers. Prove that Spec(a) and
Spec(
6)
partition the positive integers if and only if a and
(3
are irrational and
l/a+l/P
=l.
14 Prove or disprove:
(xmodny)mody = xmody, integer n.
15 Is there an identity analogous to (3.26) that uses ceilings instead of floors?
16
Prove that n mod 2 = (1
-
(-1)“) /2. Find and prove a similar expression
for
n
mod 3 in the form a
+
bw” +
CW~“,
where w is the complex number
(-1
+i&)/2.
Hint:
cu3
= 1 and 1
+w+w’=O.
17 Evaluate the sum &Gk<m lx + k/mJ in the case x 3 0 by substituting
xj
(1
< j < x + k/m] for lx + k/m] and summing first on k. Does your
answer agree with
(3.26)?
18 Prove that the boundary-value error term S in (3.30) is at most a-Iv.
Hint: Show that small values of j are not involved.
Homework exercises
19
Find a necessary and sufficient condition on the real number b > 1 such
that
for all real x 3 1.
3 EXERCISES 97
20 Find the sum of all multiples of x in the closed interval
[(x..
fi],
when
x > 0.
21 How many of the numbers
2",
for 0 6 m < M, have leading digit 1 in
decimal notation?
22 Evaluate the sums S, =
&,
[n/2k +
ij
and
T,
=
tk3,
2k
[n/2k
+ i]
2.
23
Show that the nth element of the sequence
1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,...
is
[fi
+
51.
(The sequence contains exactly m occurrences of m.)
24 Exercise 13 establishes an interesting relation between the two multisets
Spec(oL)
and Spec(oc/(ol- l)), when
OL
is any irrational number > 1,
because 1
/OL
+ (
OL
-
1
)/OL
= 1. Find (and provej an interesting relation
between the two multisets Spec(a) and Spec(oL/(a+ l)), when
OL
is any
positive real number.
25 Prove or disprove that the Knuth numbers, defined by
(3.16),
satisfy
K,
3 n for all nonnegative n.
26 Show that the auxiliary
Josephus
numbers (3.20) satisfy
for n 3 0.
27 Prove that infinitely many of the numbers
DF’
defined by (3.20) are
even, and that infinitely many are odd.
28 Solve the recurrence
a0
= 1;
a
n=
an-l
+
lJan-l.l,
for n > 0.
29 Show that, in addition to
(3.31),
we have
D(oL,n)
3
D(oI’,
1an.J)
-
0~~’
-2.
30 Show that the recurrence
X0
= m,
x,
=
x:-,-2,
for n > 0,
has the solution X, =
[01~“1,
if m is an integer greater than
a +
0~~’
= m and
OL
>
1.
For example, if m = 3 the solution is
2,
where
x,
=
[@2n+’
1
)
l+Js
4=-y-,
a =
a2.
98 INTEGER FUNCTIONS
31 Prove or disprove:
1x1
+
\yJ
+
Lx
+
y]
6
12x1
+
[ZyIJ
.
32
Let
(Ix((
= min(x
-
1x1,
[xl
-x)
denote the distance from x to the nearest
integer. What is the value of
x
2kllx/2kJJ2
?
k
(Note that this sum can be doubly infinite. For example, when x =
l/3
the terms are nonaero as k
+
-oo and also as k
+
+oo.)
Exam problems
33 A circle, 2n
-
1 units in diameter, has been drawn symmetrically on a
2n x 2n chessboard, illustrated here for n = 3:
a
How many cells of the board contain a segment of the circle?
b Find a function f(n, k) such that exactly
xc::
f(n, k) cells of the
board lie entirely within the circle.
34 Let f(n) =
Et=,
[lgkl.
Find a closed form for f(n) , when n 3 1.
L
Provethatf(n)=n-l+f([n/2~)+f(~n/Z])foralln~l.
35
Simplify the formula \(n + 1
)‘n!
e]
mod n.
Simplify it,
but
36 Assuming that n is a nonnegative integer, find a closed form for the sum
don’t change the
value,
x
1
l<k<Z2”
2lk“J4lkkkJ
37 Prove the identity
t
(Lm-Jkj
_
1:~)
=
[:J
_
jmi+mOdn;lim)
mOdn12J
O$k<m
for all positive integers m and
n.
38 Let
x1,
.,.,
xn
be real numbers such that the identity
holds for all positive integers m. Prove something interesting about
Xl,
.‘.) x,.
3 EXERCISES 99
39 Prove that the double sum &k~‘og,x
&j<b[(~
+ jbk)/bk+‘] equals
(b- l)(Llog’,xl + 1) +
[xl
-
1,
f
or every real number x 3 1 and every
integer b > 1.
40 The spiral function o(n), indicated in the diagram below, maps a non-
negative integer n onto an ordered pair of integers (x(n), y (n)). For
example, it maps n = 9 onto the ordered pair
(1,2).
tY
4
a Prove that if m = [J;;I,
x(n) =
(-l)“((n-m(m+l)).[[ZfiJ
iseven] +
[irnl),
and find a similar formula for y(n). Hint: Classify the spiral into
segments
Wk,
Sk,
Ek,
Nk
according as
[2fij
= 4k
-
2, 4k
-
1,
4k,
4k+
1.
b Prove that, conversely, we can determine n from o(n) by a formula
of the form
n
=
WI2
f
(2k+x(n) +y(n))
,
k
=
m=(lx(n)l,lv(n)l).
Give a rule for when the sign is + and when the sign is
-.
Bonus problems
41 Let f and g be increasing functions such that the sets {f
(1))
f (2), . . .
}
and
{g (1) , g
(2))
. .
}
partition the positive integers. Suppose that f and g are
related by the condition g(n) = f(f(n)) + 1 for all n > 0. Prove that
f(n) =
[n@J
and g(n) =
ln@‘J,
where
@
= (1 +
&)/2.
42 Do there exist real numbers a,
(3,
and y such that Spec(a),
Spec(
(3),
and
Spec(y)
together partition the set of positive integers?
3 EXERCISES 101
Research problems
49
Find a necessary and sufficient condition on the nonnegative real numbers
a and
p
such that we can determine a and
/3
from the infinite multiset
of values
59
bet x be a real number 3
@
=
i
(1 +
&).
The solution to the recurrence
Zo(x)
=
x7
Z,(x) =
Z,&x)'-1
,
for n > 0,
can be written Z,(x) =
[f(x)2”1,
if x is an integer, where
f(x) = $nmZn(x)1'2n ,
because Z,(x)
-
1 < f
(x)2”
< Z,(x). What interesting properties does
this function
f(x)
have?
51
Given nonnegative real numbers
o(
and
(3,
let
Sw(a;P)
=
{la+PJ,l2a+P1,13a+P1,...}
be a multiset that generalizes Spec(a) =
Spec(a;
0). Prove or disprove:
If the m 3 3 multisets
Spec(a1;
PI),
Spec(a2;
/32),
. . . , Spec(a,;
&,,)
partition the positive integers, and if the parameters
a1
<
a2
<
. . < a,,,
are rational, then
2m-1
ak
=
-
2k-1
for
1
6
k
<
m.
52
Fibonacci’s algorithm (exercise 9) is “greedy” in the sense that it chooses
the least conceivable q at every step. A more complicated algorithm is
known by which every fraction m/n with n odd can be represented as a
sum of distinct unit fractions
1
/qj
+ .
+
.
+
1
/qk
with odd denominators.
Does the greedy algorithm for such a representation always terminate?
4
Number Theory
INTEGERS ARE CENTRAL to the discrete mathematics we are emphasiz-
ing in this book. Therefore we want to explore the theory of numbers, an
important branch of mathematics concerned with the properties of integers.
We tested the number theory waters in the previous chapter, by intro-
ducing binary operations called ‘mod’ and ‘gcd’. Now let’s plunge in and
really immerse ourselves in the subject.
4.1 DIVISIBILITY
We say that m divides n (or n is divisible by m) if m > 0 and the
ratio
n/m
is an integer. This property underlies all of number theory, so it’s
convenient to have a special notation for it. We therefore write
m\n
++
m > 0 and n = mk for some integer k.
(4.1)
(The notation ‘mln’ is actually much more common than ‘m\n’ in current
mathematics literature. But vertical lines are overused-for absolute val-
ues, set delimiters, conditional probabilities, etc. -and backward slashes are
underused. Moreover, ‘m\n’ gives an impression that m is the denominator of
an implied ratio. So we shall boldly let our divisibility symbol lean leftward.)
If m does not divide n we write
‘m!qn’.
There’s a similar relation, “n is a multiple of
m,”
which means almost
the same thing except that m doesn’t have to be positive. In this case we
simply mean that n = mk for some integer k. Thus, for example, there’s only
one multiple of 0 (namely 0), but nothing is divisible by 0. Every integer is
a multiple of -1, but no integer is divisible by -1 (strictly speaking). These
definitions apply when m and n are any real numbers; for example,
271
is
divisible by
7~.
But we’ll almost always be using them when m and n are
integers. After all, this is number theory.
102
In
other words, be
prepared to drown.
‘I
no integer is
dksible by -1
(strictly speaking).”
-Graham,
Knuth,
and Patashnik [131]
In Britain we call
this
‘hcf’
(highest
common factor).
Not to be confused
with the greatest
common multiple.
(Remember that
m’ or n’ can be
negative.)
4.1 DIVISIBILITY
103
The greatest common divisor of two integers m and n is the largest
integer that divides them both:
gcd(m,n) = max{ k 1 k\m and
k\n}.
(4.2)
For example, gcd( 12,lS) = 6. This is a familiar notion, because it’s the
common factor that fourth graders learn to take out of a fraction m/n when
reducing it to lowest terms: 12/18 = (12/6)/( 1 S/6) = 2/3. Notice that if
n > 0 we have gcd(0, n) = n, because any positive number divides 0, and
because n is the largest divisor of itself. The value of gcd(0,O) is undefined.
Another familiar notion is the
least
common multiple,
lcm(m,n) = min{k 1
k>O,
m\k and
n\k};
(4.3)
this is undefined if m < 0 or n 6 0. Students of arithmetic recognize this
as the least common denominator, which is used when adding fractions with
denominators m and n. For example, lcm( 12,lS) = 36, and fourth graders
know that
6
+ & =
g
+
$
=
g.
The lcm is somewhat analogous to the
gcd, but we don’t give it equal time because the gcd has nicer properties.
One of the nicest properties of the gcd is that it is easy to compute, using
a 2300-year-old method called Euclid’s algorithm. To calculate gcd(m,n),
for given values 0 < m < n, Euclid’s algorithm uses the recurrence
gcd(O,n) = n;
gcd(m,n) = gcd(n mod m, m) , for m > 0.
(4.4)
Thus, for example, gcd( 12,lS) = gcd(6,12) = gcd(0,6) = 6. The stated
recurrence is valid, because any common divisor of m and n must also be a
common divisor of both m and the number n mod m, which is n
-
[n/m]
m.
There doesn’t seem to be any recurrence for lcm(m,n) that’s anywhere near
as simple as this. (See exercise 2.)
Euclid’s algorithm also gives us more: We can extend it so that it will
compute integers m’ and n’ satisfying
m’m + n’n = gcd(m, n) .
(4.5)
Here’s how. If m = 0, we simply take m’ = 0 and n’ = 1. Otherwise we
let
r
= n mod m and apply the method recursively with
r
and m in place of
m and n, computing
F
and
ii%
such that
Fr +
?%rn
= gcd(r, m) .
Since
r
= n
-
[n/m]m
and gcd(r, m) = gcd(m,n), this equation tells us that
Y(n-
ln/mJm)
+mm
=
gcd(m,n).
104 NUMBER THEORY
The left side can be rewritten to show its dependency on m and n:
(iTi
-
[n/mj
F)
m + Fn = gcd(m, n)
;
hence m’ =
K
-
[n/mJF and n’ = f are the integers we need in (4.5). For
example, in our favorite case m = 12, n =
18,
this method gives 6 = 0.0+1.6 =
1.6+0+12=(-1).12+1.18.
But why is (4.5) such a neat result? The main reason is that there’s a
sense in which the numbers m’ and n’ actually prove that Euclid’s algorithm
has produced the correct answer in any particular case. Let’s suppose that
our computer has told us after a lengthy calculation that gcd(m, n) = d and
that m’m + n’n =
d;
but we’re skeptical and think that there’s really a
greater common divisor, which the machine has somehow overlooked. This
cannot be, however, because any common divisor of m and n has to divide
m’m + n’n; so it has to divide d; so it has to be 6 d. Furthermore we can
easily check that d does divide both m and n. (Algorithms that output their
own proofs of correctness are called
self-cetiifiing.)
We’ll be using (4.5) a lot in the rest of this chapter. One of its important
consequences is the following mini-theorem:
k\m and k\n
w
k\
&Cm,
n)
.
(4.6)
(Proof: If k divides both m and n, it divides m’m + n’n, so it divides
gcd( m, n) . Conversely, if k divides gcd( m,
n),
it divides a divisor of m and a
divisor of n, so it divides both m and n.) We always knew that any common
divisor of m and n must be less than or equal to their gcd; that’s the
definition of greatest common divisor. But now we know that any common
divisor is, in fact, a divisor of their gtd.
Sometimes we need to do sums over all divisors of n. In this case it’s
often useful to use the handy rule
x
a, =
x
anlm,
m\n m\n
integer n > 0,
(4.7)
which holds since n/m runs through all divisors of n when m does. For
example, when n = 12 this says that al +
02
+
a3
+ Q +
o6
+ al2 = al2 +
a6
+
a4
+
a3
+
a2
+
al.
There’s also a slightly more general identity,
t
a, =
7
7
a,[n=mk],
m\n
k m>O
(4.8)
which is an immediate consequence of the definition (4.1). If n is positive, the
right-hand side of (4.8) is
tk,,,
on/k;
hence (4.8) implies (4.7). And equation
4.1 DIVISIBILITY 105
(4.8) works also when n is negative. (In such cases, the nonzero terms on the
right occur when k is the negative of a divisor of n.)
Moreover, a double sum over divisors can be “interchanged” by the law
t
x
ak,m =
x
x
ak,kl .
m\n
k\m
k\n
L\in/kl
(4.9)
For example, this law takes the following form when n = 12:
al,1 + (al.2 +
a2,2)
+ (al,3 +
a3,3)
+ fall4 +
a2,4
+
a4,4)
+ (al.6 +
a2,6
+
a3,6
+
a6,6)
+ tal,12 + a2,l2 +
a&12
+
a4,12
+
a6,12
+
a12,12)
=
tal.l
+
al.2
+
al.3
+
al.4 + al,6 +
al.12)
+
ta2,2
+
a2.4
+
a2,6
+
a&12) +
(a3,3
+
as,6
+
CQ12)
+
tad,4
$-
q12)
+
(a6,6
+
a6,12)
+
a12,12.
We can prove (4.9) with Iversonian manipulation. The left-hand side is
x
x
ak.,[n=iml[m=kll
=
7
y
ak,kt[n=Ml;
i,l
k,m>O
j
k,1>0
the right-hand side is
x
t
ok.k~[n=jkl[n/k=mll
=
t
t
ak,kt[n=mlkl,
j,m
k,l>O
m
k.1>0
which is the same except for renaming the indices. This example indicates
that the techniques we’ve learned in Chapter 2 will come in handy as we study
number theory.
4.2 PRIMES
A positive integer p is called prime if it has just two divisors, namely
1 and
p.
Throughout the rest of this chapter, the letter p will always stand
How about the p in
for a prime number, even when we don’t say so explicitly. By convention,
‘explicitly’?
1 isn’t prime, so the sequence of primes starts out like this:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, .,
Some numbers look prime but aren’t, like 91 (= 7.13) and 161 (= 7.23). These
numbers and others that have three or more divisors are called composite.
Every integer greater than 1 is either prime or composite, but not both.
Primes are of great importance, because they’re the fundamental building
blocks of all the positive integers. Any positive integer n can be written as a
106 NUMBER THEORY
product of primes,
n =
p,...pm
=
fiPk,
Pl 6
.‘.
6 Pm.
(4.10)
k=l
For example,
12=2.2.3;
11011
=7.11.11.13;
11111
=41.271.
(Products
denoted by
n
are analogous to sums denoted by
t,
as explained in exer-
cise 2.25. If m = 0, we consider this to be an empty product, whose value
is 1 by definition; that’s the way n = 1 gets represented by (4.10).) Such a
factorization is always possible because if n > 1 is not prime it has a divisor
nl such that 1 < nl < n; thus we can write n = nl .nz, and (by induction)
we know that nl and
n2
can be written as products of primes.
Moreover, the expansion in (4.10) is unique: There’s only one way to
write n as a product of primes in nondecreasing order. This statement is
called the Fundamental Theorem of Arithmetic, and it seems so obvious that
we might wonder why it needs to be proved. How could there be two different
sets of primes with the same product? Well, there can’t, but the reason isn’t
simply “by definition of prime numbers!’ For example, if we consider the set
of all real numbers of the form m +
nm
when m and n are integers, the
product of any two such numbers is again of the same form, and we can call
such a number “prime” if it can’t be factored in a nontrivial way. The number
6 has two representations,
2.3
= (4 +
&8
j(4
-
fi
1;
yet exercise 36 shows
that 2, 3, 4 +
m,
and 4
-
m
are all “prime” in this system.
Therefore we should prove rigorously that (4.10) is unique. There is
certainly only one possibility when n = 1, since the product must be empty
in that case; so let’s suppose that n > 1 and that all smaller numbers factor
uniquely. Suppose we have two factorizations
n = p, . .
*Pm =
ql...qk,
Pl<...<Pm
and
ql<“‘<qk,
where the p’s and q’s are all prime. We will prove that
pr
=
41.
If not, we
can assume that p, < q,, making p, smaller than all the q’s. Since p, and
q1
are prime, their gcd must be 1; hence Euclid’s self-certifying algorithm gives
us integers a and b such that ap, + bql = 1. Therefore
am
q2..
.
qk +
b‘llqz...qk
=
qz...‘.jk.
Now p, divides both terms on the left, since
q,
q2
. . ,
qk
= n; hence p, divides
the right-hand side, 42.. .
qk.
Thus 42.. . ok/p, is an integer, and 42.. .
qk
has a prime factorization in which p, appears. But 42.. .
qk
< n, so it has a
unique factorization (by induction). This contradiction shows that p, must
be equal to q, after all. Therefore we can divide both of n’s factorizations by
p,, obtaining
pz
. . .p,,, = 42.. .
qk
< n. The other factors must likewise be
equal (by induction), so our proof of uniqueness is complete.
4.2 PRIMES 107
It’s the factor-
Sometimes it’s more useful to state the Fundamental Theorem in another
ization, not the
theorem, that’s
way: Every positive integer can be written uniquely in the form
unique.
n
=
nP”Y
where each
np
3 0.
(4.11)
P
The right-hand side is a product over infinitely many primes; but for any
particular n all but a few exponents are zero, so the corresponding factors
are 1. Therefore it’s really a finite product, just as many “infinite” sums are
really finite because their terms are mostly zero.
Formula (4.11) represents n uniquely, so we can think of the sequence
(nz, n3, n5, . ) as a number system for positive integers. For example, the
prime-exponent representation of 12 is
(2,1,0,0,.
. . ) and the prime-exponent
representation of 18 is
(1,2,0,0,
.
).
To multiply two numbers, we simply
add their representations. In other words,
k = mn
k,
=
m,+n,
forallp.
(4.12)
This implies that
m\n
and it follows immediately that
mp <
np
for all p,
(4.13)
k = gcd(m,n)
#
k,
=
min(m,,n,)
for allp;
(4.14)
k = lcm(m,n)
W
k,
=
max(m,,n,)
for all
p.
(4.15)
For example, since 12 =
22
.3’ and 18 = 2’ .
32,
we can get their gcd and lcm
by taking the min and max of common exponents:
gcd(12,18) = 2
min(2,li
.3min(l,21 =
21
.31
= 6;
lcm(12,18) = 2
maX(2,1)
. 3max(l,2) =
22
.32 = 36.
If the prime p divides a product mn then it divides either m or n, perhaps
both, because of the unique factorization theorem. But composite numbers
do not have this property. For example, the nonprime 4 divides 60 = 6.10,
but it divides neither 6 nor 10. The reason is simple: In the factorization
60 =
6.10
=
(2.3)(2.5),
the two prime factors of 4 =
2.2
have been split
into two parts, hence 4 divides neither part. But a prime is unsplittable, so
it must divide one of the original factors.
4.3
PRIME EXAMPLES
How many primes are there? A lot. In fact, infinitely many. Euclid
proved this long ago in his Theorem 9: 20, as follows. Suppose there were
108 NUMBER THEORY
only finitely many primes, say k of
them--,
3, 5, . . . ,
Pk.
Then, said Euclid,
we should consider the number
M =
2’3’5’..:Pk
+ 1.
None of the k primes can divide M, because each divides M
-
1. Thus there
must be some other prime that divides M; perhaps M itself is prime. This
contradicts our assumption that 2, 3, . . . ,
Pk
are the only primes, so there
must indeed be infinitely many.
Euclid’s proof suggests that we define Euclid numbers by the recurrence
e
n
=
elez...e,-1
+
1,
whenn>l.
(4.16)
The sequence starts out
el
=I+1
=2;
e2
=2+1
=3;
e3
= 2.3+1 = 7;
e4
= 2.3.7+1 = 43;
these are all prime. But the next case,
e
5, is
1807
=
13.139. It
turns out that
e6
= 3263443 is prime, while
e7
=
547.607.1033.31051;
e8
= 29881~67003~9119521~6212157481.
It is known that
es,
. . . ,
e17
are composite, and the remaining
e,
are probably
composite as well. However, the Euclid numbers are all
reZatiweZy
prime to
each other; that is,
gcd(e,,e,)
=
1
,
when m # n.
Euclid’s algorithm (what else?) tells us this in three short steps, because
e,
mod
e,
= 1 when n > m:
gc4em,e,)
=
gcd(l,e,)
= gcd(O,l) =
1
,
Therefore, if we let
qj
be the smallest factor of
ej
for all j 3
1,
the primes 41,
q2,
(73,
. . .
are all different. This is a sequence of infinitely many primes.
Let’s pause to consider the Euclid numbers from the standpoint of Chap-
ter 1. Can we express
e,
in closed form? Recurrence (4.16) can be simplified
by removing the three dots: If n > 1 we have
cdot
7rpLjro1
lvpopoi
nkiov<
&i
murb~
706
Xp0rE&ur0(
7rXijOOV~
7rphwu
IypLep(;Iu.~~
-Euclid [SO]
[Translation:
“There are
more
primes than in
any given set
of primes.
“1
en
=
el
.
.
.
en-2en-l +
1
=
(en-l
-l)e,-j
fl =
&,-qp,
+
1.
4.3 PRIME EXAMPLES 109
Thus
e,
has about twice as many decimal digits as e,-1 . Exercise 37 proves
that there’s a constant E
z
1.264
such that
(4.17)
And exercise 60 provides a similar formula that gives nothing but primes:
P
n
=
lp3"J
,
for some constant P. But equations like (4.17) and (4.18) cannot really be
considered to be in closed form, because the constants E and P are computed
from the numbers
e,
and
p,,
in a sort of sneaky way. No independent re-
lation is known (or likely) that would connect them with other constants of
mathematical interest.
Indeed, nobody knows any useful formula that gives arbitrarily large
primes but only primes. Computer scientists at Chevron Geosciences did,
however, strike mathematical oil in 1984. Using a program developed by
David Slowinski, they discovered the largest prime known at that time,
2216091
-1
Or
probably more,
by the time you
read
this.
while testing a new Cray X-MP supercomputer. It’s easy to compute this
number in a few milliseconds on a personal computer, because modern com-
puters work in binary notation and this number is simply (11 . .
.1)2.
All
216 091 of its bits are ‘1'. But it’s much harder to prove that this number
is prime. In fact, just about any computation with it takes a lot of time,
because it’s so large. For example, even a sophisticated algorithm requires
several minutes just to convert
22’609’
-
1
to radix
10
on a PC. When printed
out, its 65,050 decimal digits require 65 cents U.S. postage to mail first class.
Incidentally,
22’609’
-
1 is the number of moves necessary to solve the
Tower of Hanoi problem when there are 216,091 disks. Numbers of the form
2p
-
1
(where p is prime, as always in this chapter) are called Mersenne numbers,
after Father Marin Mersenne who investigated some of their properties in the
seventeenth century. The Mersenne primes known to date occur for
p = 2, 3,
5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253,
4423,
9689,9941,
11213,19937,21701,
23209,44497,
86243,110503,
132049,
and 216091.
The number 2”
-
1 can’t possibly be prime if n is composite, because
2k”
-
1
has 2”’
-
1 as a
factor:
2km
-
1
=
(2"
-
l)(2mckp')
+2"'+2)
+...+
1).
110 NUMBER THEORY
But
2P
-
1 isn’t always prime when p is prime;
2”
-
1 = 2047 = 23.89 is the
smallest such nonprime. (Mersenne knew this.)
Factoring and primality testing of large numbers are hot topics nowadays.
A summary of what was known up to 1981 appears in Section 4.5.4 of
[174],
and many new results continue to be discovered. Pages 391-394 of that book
explain a special way to test Mersenne numbers for primality.
For most of the last two hundred years, the largest known prime has
been a Mersenne prime, although only 31 Mersenne primes are known. Many
people are trying to find larger ones, but it’s getting tough. So those really
interested in fame (if not fortune) and a spot in The Guinness Book of World
Records might instead try numbers of the form 2nk +
1,
for small values of k
like 3 or 5. These numbers can be tested for primality almost as quickly as
Mersenne numbers can; exercise 4.5.4-27 of
[174]
gives the details.
We haven’t fully answered our original question about how many primes
there are. There are infinitely many, but some infinite sets are “denser” than
others. For instance, among the positive integers there are infinitely many
even numbers and infinitely many perfect squares, yet in several important
senses there are more even numbers than perfect squares. One such sense
looks at the size of the nth value. The nth even integer is 2n and the nth
perfect square is
n’;
since 2n is much less than
n2
for large n, the nth even
integer occurs much sooner than the nth perfect square, so we can say there
are many more even integers than perfect squares. A similar sense looks at
the number of values not exceeding x. There are 1x/2] such even integers and
L&j
perfect squares; since x/2 is much larger than
fi
for large x, again we
can say there are many more even integers.
What can we say about the primes in these two senses? It turns out that
the nth prime, P,, is about n times the natural log of n:
pll
N
nlnn.
(The symbol
‘N’
can be read “is asymptotic to”; it means that the limit of
the ratio
PJnlnn
is 1 as n goes to infinity.) Similarly, for the number of
primes
n(x)
not exceeding x we have what’s known as the prime number
theorem:
Proving these two facts is beyond the scope of this book, although we can
show easily that each of them implies the other. In Chapter 9 we will discuss
the rates at which functions approach infinity, and we’ll see that the func-
tion nlnn, our approximation to
P,,
lies between 2n and
n2
asymptotically.
Hence there are fewer primes than even integers, but there are more primes
than perfect squares.
Weird. I thought
there were the same
number of even
integers as per-
fect squares, since
there’s a one-to-one
correspondence
between them.
4.3 PRIME EXAMPLES 111
These formulas, which hold only in the limit as n or
x
+
03, can be
replaced by more exact estimates. For example, Rosser and Schoenfeld
[253]
have established the handy bounds
lnx-i
<
*
<
lnx-t,
for x 3 67;
(4.19)
n(lnn+lnlnn-3)
<
P,
<
n(lnn+lnlnn-t),
forn320.
(4.20)
If we look at a “random” integer n, the chances of its being prime are
about one in Inn. For example, if we look at numbers near
1016,
we’ll have to
examine about 16 In 10 x 36.8 of them before finding a prime. (It turns out
that there are exactly 10 primes between
1016
-
370 and
1016
-
1.) Yet the
distribution of primes has many irregularities. For example, all the numbers
between
PI
PZ
P,
+ 2 and
P1
PJ
. . .
P,
+
P,+l
-
1 inclusive are composite.
Many examples of “twin primes” p and p + 2 are known (5 and 7, 11 and 13,
17and19,29and31,
. . . . 9999999999999641 and 9999999999999643, . . .
),
yet
nobody knows whether or not there are infinitely many pairs of twin primes.
(See Hardy and Wright
[150,
$1.4 and
52.81.)
One simple way to calculate all
X(X)
primes 6 x is to form the so-called
sieve of Eratosthenes: First write down all integers from 2 through x. Next
circle 2, marking it prime, and cross out all other multiples of 2. Then repeat-
edly circle the smallest uncircled, uncrossed number and cross out its other
multiples. When everything has been circled or crossed out, the circled num-
bers are the primes. For example when x = 10 we write down 2 through 10,
circle 2, then cross out its multiples 4, 6, 8, and 10. Next 3 is the smallest
uncircled, uncrossed number, so we circle it and cross out 6 and 9. Now
5 is smallest, so we circle it and cross out 10. Finally we circle 7. The circled
numbers are 2, 3, 5, and 7; so these are the
X(
10) = 4 primes not exceeding 10.
“Je
me
sers
de la
z”;$
Zg$;/f
4.4
FACTORIAL FACTORS
produif
de
nombres
dkroissans
depuis
Now let’s take a look at the factorization of some interesting highly
n
jusqu9
l’unitk,
composite numbers, the factorials:
saioir-n(n
-
1)
(n
-
2). 3.2.1.
L’emploi continue/
de
l’analyse
combi-
natoire que je fais
dans /a plupart de
mes dCmonstrations,
a
rendu
cette
nota-
tion
indispensa
b/e.
-
Ch. Kramp (186]
n! =
1.2...:n
=
fib
integer n 3 0.
(4.21)
k=l
According to our convention for an empty product, this defines O! to be 1.
Thus n! = (n
-
1 )! n for every positive integer n. This is the number of
permutations of n distinct objects. That is, it’s the number of ways to arrange
n things in a row: There are n choices for the first thing; for each choice of
first thing, there are n
-
1 choices for the second; for each of these
n(n
-
1)
choices, there are n
-
2 for the third; and so on, giving n(n
-
1) (n
-
2) . . . (1)
112 NUMBER THEORY
arrangements in all. Here are the first few values of the factorial function.
n 01234
5 6 7
8 9 10
n!
1 1 2 6 24
120 720 5040 40320 362880 3628800
It’s useful to know a few factorial facts, like the first six or so values, and the
fact that
lo!
is about
34
million plus change; another interesting fact is that
the number of digits in n! exceeds n when n > 25.
We can prove that n! is plenty big by using something like Gauss’s trick
of Chapter 1:
n!’
= (1
.2...:n)(n...
:2.1) =
fik(n+l-k).
k=l
We have n 6 k(n + 1
-
k)
6
$
(n + 1
)2,
since the quadratic polynomial
k(n+l
-k)
=
a(r~+l)~-
(k-
$(n+
1))2
has its smallest value at k = 1 and
its largest value at k =
i
(n + 1). Therefore
k=l k=l
that is,
n
n/2
6
n!
<
(n+
l)n
2n
.
(4.22)
This relation tells us that the factorial function grows exponentially!!
To approximate n! more accurately for large n we can use Stirling’s
formula, which we will derive in Chapter 9:
n!
N
&Gi(:)n.
(4.23)
And a still more precise approximation tells us the asymptotic relative error:
Stirling’s formula undershoots n! by a factor of about 1
/(
12n). Even for fairly
small n this more precise estimate is pretty good. For example, Stirling’s
approximation
(4.23)
gives a value near 3598696 when n = 10, and this is
about 0.83%
x
l/l20
too small. Good stuff,
asymptotics.
But let’s get back to primes. We’d like to determine, for any given
prime
p,
the largest power of p that divides n!; that is, we want the exponent
of p in
n!‘s
unique factorization. We denote this number by
ep
(n!), and we
start our investigations with the small case p = 2 and n = 10. Since
lo!
is the
product of ten numbers,
e:2(
lo!)
can be found by summing the powers-of-2
4.4 FACTORIAL FACTORS 113
contributions of those ten numbers; this calculation corresponds to summing
the columns of the following array:
11
23456789101powersof2
divisible by 2 x x x x x 5 =
[10/2J
divisible by 4
X
X
2 =
[10/4]
divisible by 8 X
1 = [10/S]
powersof
010201030 1 (
8
(The column sums form what’s sometimes called the ruler function p(k),
because of their similarity to
‘m
‘,
the lengths of lines marking
fractions of an inch.) The sum of these ten sums is 8; hence
2*
divides
lo!
but
29
doesn’t.
There’s also another way: We can sum the contributions of the rows.
The first row marks the numbers that contribute a power of 2 (and thus are
divisible by 2); there are [10/2J = 5 of them. The second row marks those
that contribute an additional power of 2; there are
L10/4J
= 2 of them. And
the third row marks those that contribute yet another; there are
[10/S]
= 1 of
them. These account for all contributions, so we have
~2
(1 O!) = 5 + 2 + 1 = 8.
For general n this method gives
ez(n!) =
This sum is actually finite, since the summand is zero when
2k
> n. Therefore
it has only [lgn] nonzero terms, and it’s computationally quite easy. For
instance, when n = 100 we have
q(lOO!)
=
50+25+12+6+3+1
= 97.
Each term is just the floor of half the previous term. This is true for all n,
because as a special case of (3.11) we have lr~/2~+‘J = Lln/2k]
/2].
It’s espe-
cially easy to see what’s going on here when we write the numbers in binary:
100
= (1100100)~
=lOO
L100/2]
= (110010)~ = 50
L100/4] = (11001)2 = 25
1100/8] =
(1100)2
= 12
[100/16J
=
(110)2
= 6
1100/32]
=
(1l)z
= 3
[100/64J
=
(I)2
=
1
We merely drop the least significant bit from one term to get the next.
114 NUMBER THEORY
The binary representation also shows us how to derive another formula,
E~(TI!)
=
n-Y2(n)
,
(4.24)
where ~z(n) is the number of l’s in the binary representation of n. This
simplification works because each 1 that contributes 2”’ to the value of n
contributes 2”-’ +
2mP2
+ . .
.+2’=2”-1
tothevalueofcz(n!).
Generalizing our findings to an arbitrary prime p, we have
(4.25)
by the same reasoning as before.
About how large is c,(n!)? We get an easy (but good) upper bound by
simply removing the floor from the summand and then summing an infinite
geometric progression:
e,(n!) <
i+l+n+...
P2
P3
=
11
P
(
,+i+$+...
1
-n
P
-
-0
P
P-1
n
=p_l.
For p = 2 and n = 100 this inequality says that 97 < 100. Thus the up-
per bound 100 is not only correct, it’s also close to the true value 97. In
fact, the true value n
-
VI(~)
is
N
n in general, because ~z(n) 6 [lgnl is
asymptotically much smaller than n.
When p = 2 and 3 our formulas give
ez(n!)
N
n and
e3(n!)
N
n/2, so
it seems reasonable that every once in awhile
e3
(n!) should be exactly half
as big as
ez(n!).
For example, this happens when n = 6 and n = 7, because
6!
=
24.
32
.5
=
7!/7.
But nobody has yet proved that such coincidences
happen infinitely often.
The bound on e,(n!) in turn gives us a bound on
p”~(~!),
which is p’s
contribution to n! :
P
Gin!)
<
pw(P-‘)
.
And we can simplify this formula (at the risk of greatly loosening the upper
bound) by noting that p <
2pP’;
hence
pn/(Pme’)
6 (2p-‘)n/(pp’) =
2”.
In
other words, the contribution that any prime makes to n! is less than 2”.
4.4 FACTORIAL FACTORS 115
We can use this observation to get another proof that there are infinitely
many primes. For if there were only the k primes 2, 3, . . . ,
Pk,
then we’d
have n! <
(2”)k
=
2nk
for all n
>
1, since each prime can contribute at most
a factor of 2”
-
1. But we can easily contradict the inequality n! <
2”k
by
choosing n large enough, say n =
22k.
Then
contradicting the inequality n! > n
n/2
that we derived in (4.22). There are
infinitely many primes, still.
We can even beef up this argument to get a crude bound on n(n), the
number of primes not exceeding n. Every such prime contributes a factor of
less than
2”
to n!; so, as before,
n! < 2nn(n).
If we replace n! here by Stirling’s approximation (4.23), which is a lower
bound, and take logarithms, we get
nrr(n) > nlg(n/e) +
i
lg(27rn)
;
hence
This lower bound is quite weak, compared with the actual value z(n)
-
n/inn,
because logn is much smaller than n/logn when n is large. But we
didn’t have to work very hard to get it, and a bound is a bound.
4.5
RELATIVE PRIMALITY
When gcd(m, n) =
1,
the integers m and n have no prime factors in
common and we say that they’re relatively prime.
This concept is so important in practice, we ought to have a special
notation for it; but alas, number theorists haven’t come up with a very good
one yet. Therefore we cry: HEAR us, 0 MATHEMATICIANS OF THE WORLD!
LETUS
NOTWAITANYLONGER! WE CAN MAKEMANYFORMULAS CLEARER
Like perpendicular
BY DEFINING A NEW NOTATION NOW! LET us AGREE TO WRITE ‘m
I
n’,
lines don
‘t
have
IF
m A ND n
ARE
RELATIVELY
PRIME.
a common direc-
AND TO SAY
U,
IS PRIME TO
Tl.;
tion, perpendicular
In other words, let us declare that
numbers don’t have
common factors.
ml-n
w
m,n
are integers and gcd(m,n) =
1,
(4.26)
116 NUMBER THEORY
A fraction m/n is in lowest terms if and only if m
I
n. Since we
reduce fractions to lowest terms by casting out the largest common factor of
numerator and denominator, we suspect that, in general,
mlgcd(m,n)
1
n/gcd(m,
n)
;
(4.27)
and indeed this is true. It follows from a more general law, gcd(km, kn) =
kgcd(m, n), proved in exercise 14.
The
I
relation has a simple formulation when we work with the prime-
exponent representations of numbers, because of the gcd rule (4.14):
mln
min(m,,n,)
= 0 for allp.
(4.28)
Furthermore, since mP and
nP
are nonnegative, we can rewrite this as
The dot product is
zero, like orthogonal
mln
mPnP
= 0 forallp.
(4.2g) vectors.
And now we can prove an important law by which we can split and combine
two
I
relations with the same left-hand side:
klm
and
kin
k
I
mn.
(4.30)
In view of (4.2g), this law is another way of saying that k,,mp = 0 and
kpnp = 0 if and only if
kP
(mp +
np)
= 0, when mp and
np
are nonnegative.
There’s a beautiful way to construct the set of all nonnegative fractions
m/n with m
I
n, called the
Stem-Brocot
tree because it was discovered
Interesting how
independently by Moris Stern
[279],
a German mathematician, and Achille
mathematicians
Brocot
[35],
a French clockmaker. The idea is to start with the two fractions
will say
“discov-
(y ,
i)
and then to repeat the following operation as many times as desired:
ered”
when
abso-
lute/y anyone e/se
would
have said
Insert
m+m’
n+
between two adjacent fractions
z
and
$
.
The new fraction
(m+m’)/(n+n’)
is called the mediant of m/n and
m’/n’.
For example, the first step gives us one new entry between
f
and
A,
and the next gives two more:
0
11 21
7, 23
7, 7,
5
*
The next gives four more,
011213231
7,
3,
2,
3,
7,
2,
7,
7,
8;
4.5 RELATIVE PRIMALITY 117
and then we’ll get 8, 16, and so on. The entire array can be regarded as an
/guess l/O is
infinite binary tree structure whose top levels look like this:
infinity, “in lowest
terms.”
n
1
Each fraction is
*,
where F is the nearest ancestor above and to the left,
and
$
is the nearest ancestor above and to the right. (An “ancestor” is a
fraction that’s reachable by following the branches upward.) Many patterns
can be observed in this tree.
Conserve parody.
Why does this construction work? Why, for example, does each mediant
fraction
(mt
m’)/(n +n’) turn out to be in lowest terms when it appears in
this tree? (If m, m’, n, and n’ were all odd, we’d get even/even; somehow the
construction guarantees that fractions with odd numerators and denominators
never appear next to each other.) And why do all possible fractions m/n occur
exactly once? Why can’t a particular fraction occur twice, or not at all?
All of these questions have amazingly simple answers, based on the fol-
lowing fundamental fact: If m/n and m//n’ are consecutive fractions at any
stage of the construction, we have
m’n-mn’ = 1.
(4.31)
This relation is true initially (1 . 1
-
0.0 = 1); and when we insert a new
mediant (m + m’)/(n + n’), the new cases that need to be checked are
(m+m’)n-m(n+n’)
= 1;
m’(n + n’)
-
(m + m’)n’ = 1 .
Both of these equations are equivalent to the original condition (4.31) that
they replace. Therefore (4.31) is invariant at all stages of the construction.
Furthermore, if m/n < m’/n’ and if all values are nonnegative, it’s easy
to verify that
m/n < (m-t
m’)/(n+n’)
< m’/n’.
118 NUMBER THEORY
A mediant fraction isn’t halfway between its progenitors, but it does lie some-
where in between. Therefore the construction preserves order, and we couldn’t
possibly get the same fraction in two different places.
True, but if you get
One question still remains. Can any positive fraction a/b with a
I
b
a
comPound
frac-
possibly be omitted? The answer is no, because we can confine the
construe-
ture you’d better go
see
a
doctor,
tion to the immediate neighborhood of a/b, and in this region the behavior
is easy to analyze: Initially we have
m
-
0
n
-7
<(;)<A=$,
where we put parentheses around
t
to indicate that it’s not really present
yet. Then if at some stage we have
the construction forms (m +
m’)/(n
+ n’) and there are three cases. Either
(m +
m’)/(n
+ n’) = a/b and we win; or (m +
m’)/(n
+ n’) < a/b and we
can set m +- m + m’, n +- n + n’; or (m +
m’)/(n
+ n’) > a/b and we
can set m’
+
m + m’, n’
t
n + n’. This process cannot go on indefinitely,
because the conditions
“-F
> 0
,
b
and
m-
n’
;>o
imply that
an-bm 3 1 and bm’
-
an’ 3 1;
hence
(m’+n’)(an-bm)+(m+n)(bm’-an’)
3
m’+n’+m+n;
and this is the same as a + b 3 m’ + n’ + m + n by (4.31). Either m or n or
m’ or n’ increases at each step, so we must win after at most a + b steps.
The Farey series of order N, denoted by 3~, is the set of all reduced
fractions between 0 and 1 whose denominators are N or less, arranged in
increasing order. For example, if N = 6 we have
36
=
0
11112
1.3
2
3
3
5
1
1'6'5'4'3'5'2'5'3'4'5'6'1'
We can obtain 3~ in general by starting with
31
= 9,
f
and then inserting
mediants whenever it’s possible to do so without getting a denominator that
is too large. We don’t miss any fractions in this way, because we know that
the Stern-Brocot construction doesn’t miss any, and because a mediant with
denominator 6 N is never formed from a fraction whose denominator is > N.
(In other words, 3~ defines a subtree of the Stern-Brocot tree, obtained by
Fdrey ‘nough.
4.5 RELATIVE PRIMALITY 119
pruning off unwanted branches.) It follows that m’n
-
mn’ = 1 whenever
m/n and m//n’ are consecutive elements of a Farey series.
This method of construction reveals that
3~
can be obtained in a simple
way from
3~~1:
We simply insert the fraction (m + m’)/N between con-
secutive fractions m/n, m//n’ of
3~~1
whose denominators sum to N. For
example, it’s easy to obtain
37
from the elements of
36,
by inserting f ,
5,
. . . , f according to the stated rule:
3, =
0
111
I
112
I
14
3
1s
3
4
5
6
1
1'7'6'5'4'7'3'5'7'2'7'5'3'7'4'5'6'7'1'
When N is prime, N
-
1 new fractions will appear; but otherwise we’ll have
fewer than N
-
1, because this process generates only numerators that are
relatively prime to N.
Long ago in (4.5) we proved-in different words-that whenever m
I
n
and 0 < m 6 n we can find integers a and b such that
ma-nb =
1.
(4.32)
(Actually we said m’m + n’n = gcd( m,
n),
but we can write 1 for gcd( m,
n),
a for m’, and b for -n’.) The Farey series gives us another proof of
(4.32),
because we can let b/a be the fraction that precedes m/n in
3,,.
Thus (4.5)
is just (4.31) again. For example, one solution to 3a
-
7b = 1 is a = 5, b = 2,
since
i
precedes
3
in
37.
This construction implies that we can always find a
solution to (4.32) with 0 6 b < a < n, if 0 < m < n. Similarly, if 0 6 n < m
and m
I
n, we can solve (4.32) with 0 < a 6 b 6 m by letting a/b be the
fraction that follows n/m in
3m.
Sequences of three consecutive terms in a Farey series have an amazing
property that is proved in exercise 61. But we had better not discuss the
Farey series any further, because the entire Stern-Brocot tree turns out to be
even more interesting.
We can, in fact, regard the Stern-Brocot tree as a number system for
representing rational numbers, because each positive, reduced fraction occurs
exactly once. Let’s use the letters L and R to stand for going down to the
left or right branch as we proceed from the root of the tree to a particular
fraction; then a string of L’s and R’s uniquely identifies a place in the tree.
For example, LRRL means that we go left from f down to
i,
then right to
5,
then right to
i,
then left to
$.
We can consider LRRL to be a representation
of
$.
Every positive fraction gets represented in this way as a unique string
of L’s and R’s.
Well, actually there’s a slight problem: The fraction f corresponds to
the empty string, and we need a notation for that. Let’s agree to call it I,
because that looks something like 1 and it stands for “identity!’
120 NUMBER THEORY
This representation raises two natural questions: (1) Given positive inte-
gers m and n with m
I
n, what is the string of L’s and R’s that corresponds
to m/n? (2) Given a string of L’s and R’S, what fraction corresponds to it?
Question 2 seems easier, so let’s work on it first. We define
f(S) = fraction corresponding to S
when S is a string of L’s and R’s. For example, f (LRRL) =
$.
According to the construction, f(S) = (m + m’)/(n + n’) if m/n and
m’/n’ are the closest fractions preceding and following S in the upper levels
of the tree. Initially m/n = O/l and m’/n’ = l/O; then we successively
replace either m/n or m//n’ by the mediant (m + m’)/(n + n’) as we move
right or left in the tree, respectively.
How can we capture this behavior in mathematical formulas that are
easy to deal with? A bit of experimentation suggests that the best way is to
maintain a 2 x 2 matrix
that holds the four quantities involved in the ancestral fractions m/n and
m//n’ enclosing S. We could put the m’s on top and the n’s on the bottom,
fractionwise; but this upside-down arrangement works out more nicely be-
cause we have M(1) = (A:) when the process starts, and (A!) is traditionally
called the identity matrix I.
A step to the left replaces n’ by n + n’ and m’ by m + m’; hence
(This is a special case of the general rule
for multiplying 2 x 2 matrices.) Similarly it turns out that
M(SR) =
;;;,
;,)
= W-9
(;
;)
.
Therefore if we define L and R as 2 x 2 matrices,
If you’re
clueless
about matrices,
don’t panic; this
book uses them
only here.
(4.33)
4.5 RELATIVE PRIMALITY 121
we get the simple formula M(S) = S, by induction on the length of S. Isn’t
that nice? (The letters L and R serve dual roles, as matrices and as letters in
the string representation.) For example,
M(LRRL)
=
LRRL
=
(;;)(;:)(;$(;;)
=
(f;)(;;)
=
(ii);
the ancestral fractions that enclose LRRL =
$
are
5
and
f.
And this con-
struction gives us the answer to Question 2:
f(S) =
f((L
Z,))
=
s
(4.34)
How about Question
l?
That’s easy, now that we understand the fun-
damental connection between tree nodes and 2 x 2 matrices. Given a pair of
positive integers m and n, with m
I
n, we can find the position of m/n in
the Stern-Brocot tree by “binary search” as follows:
s := I;
while m/n # f(S) do
if m/n < f(S) then (output(L); S := SL)
else (output(R); S := SR)
This outputs the desired string of L’s and R’s.
There’s also another way to do the same job, by changing m and n instead
of maintaining the state S. If S is any 2 x 2 matrix, we have
f(RS) = f(S)+1
because RS is like S but with the top row added to the bottom row. (Let’s
look at it in slow motion:
n’
m+n
m’fn’
hence f(S) =
(m+m’)/(n+n’)
and f(RS) = ((m+n)+(m’+n’))/(n+n’).)
If we carry out the binary search algorithm on a fraction m/n with m > n,
the first output will be R; hence the subsequent behavior of the algorithm will
have f(S) exactly 1 greater than if we had begun with (m
-
n)/n
instead of
m/n. A similar property holds for L, and we have
m
-
= f(RS)
m-n
w
~
= f(S)) when m > n;
n n
m
-
= f(LS)
n
m
-
= f(S)) when m <
n.
n-m
122 NUMBER THEORY
This means that we can transform the binary search algorithm to the following
matrix-free procedure:
while m # n do
if m < n then (output(L); n := n-m)
else (output(R); m := m-n) .
For example, given m/n = 5/7, we have successively
m=5
5
3
1 1
n=7
2 2
2
1
output L R R L
in the simplified algorithm.
Irrational numbers don’t appear in the Stern-Brocot tree, but all the
rational numbers that are “close” to them do. For example, if we try the
binary search algorithm with the number e = 2.71828. . , instead of with a
fraction m/n, we’ll get an infinite string of L’s and R's that begins
RRLRRLRLLLLRLRRRRRRLRLLLLLLLLRLR....
We can consider this infinite string to be the representation of e in the Stern-
Brocot number system, just as we can represent e as an infinite decimal
2.718281828459...
or as an infinite binary fraction
(10.101101111110...)~.
Incidentally, it turns out that e’s representation has a regular pattern in the
Stern-Brocot system:
e = RL”RLRZLRL4RLR6LRL8RLR10LRL’2RL . . .
;
this is equivalent to a special case of something that Euler
[84]
discovered
when he was 24 years old.
From this representation we can deduce that the fractions
RRLRRLRLLLL R L R R R R R R
1 2 1 5 & 11 19 30 49 68 87
-------- 106 193 299 492 685 878 1071 1264
1'1'1'2'3' 4' 7'11'18'25'32' 39' 71'110'181'252'323' 394' 465""
are the simplest rational upper and lower approximations to e. For if m/n
does not appear in this list, then some fraction in this list whose numerator
is 6 m and whose denominator is < n lies between m/n and e. For example,
g
is not as simple an approximation as
y
= 2.714. . . , which appears in
the list and is closer to e. We can see this because the Stern-Brocot tree
not only includes all rationals, it includes them in order, and because all
fractions with small numerator and denominator appear above all less simple
ones. Thus,
g
= RRLRRLL is less than
F
= RRLRRL, which is less than
“Numerorum
congruentiam
hoc
signo,
=,
in
posterum
deno-
tabimus, modulum
ubi opus erit in
clausulis adiun-
gentes,
-16
G
9
(mod. 5), -7 =
15 (mod.
ll).”
-C.
F. Gauss
11151
4.5 RELATIVE PRIMALITY 123
e = RRLRRLR.... Excellent approximations can be found in this way. For
example,
g
M
2.718280 agrees with e to six decimal places; we obtained this
fraction from the first 19 letters of e’s Stern-Brocot representation, and the
accuracy is about what we would get with 19 bits of e’s binary representation.
We can find the infinite representation of an irrational number
a
by a
simple modification of the matrix-free binary search procedure:
if
OL
< 1 then (output(L);
OL
:=
au/(1
-K))
else (output(R);
01
:=
(x-
1) .
(These steps are to be repeated infinitely many times, or until we get tired.)
If a is rational, the infinite representation obtained in this way is the same as
before but with RLm appended at the right of
01’s
(finite) representation. For
example, if 01=
1,
we get RLLL . . . ,
corresponding to the infinite sequence of
fractions
1
Z
3
4
5
,,
,’
2’
3’
4’
*..I
which approach 1 in the limit. This situation is
exactly analogous to ordinary binary notation, if we think of L as 0 and R as 1:
Just as every real number x in
[O,
1) has an infinite binary representation
(.b,bZb3.. .
)z
not ending with all l’s, every real number
K
in
[O,
00) has
an infinite Stern-Brocot representation
B1
B2B3
. . . not ending with all R’s.
Thus we have a one-to-one order-preserving correspondence between [0, 1)
and [0, co) if we let 0
H
L and 1
H
R.
There’s an intimate relationship between Euclid’s algorithm and the
Stern-Brocot representations of rationals. Given
OL
= m/n, we get Lm/nJ
R’s, then
[n/(m
mod n)] L’s, then [(m mod
n)/(n
mod (m mod
n))]
R’s,
and so on. These numbers m mod n, n mod (m mod n), . . . are just the val-
ues examined in Euclid’s algorithm. (A little fudging is needed at the end
to make sure that there aren’t infinitely many R’s.) We will explore this
relationship further in Chapter 6.
4.6
‘MOD’: THE CONGRUENCE RELATION
Modular arithmetic is one of the main tools provided by number
theory. We got a glimpse of it in Chapter 3 when we used the binary operation
‘mod’, usually as one operation amidst others in an expression. In this chapter
we will use ‘mod’ also with entire equations, for which a slightly different
notation is more convenient:
a
s
b (mod m)
amodm = bmodm.
(4.35)
For example, 9 = -16 (mod 5), because 9 mod 5 = 4 = (-16) mod 5. The
formula ‘a = b (mod m)’ can be read “a is congruent to b modulo ml’ The
definition makes sense when a, b, and m are arbitrary real numbers, but we
almost always use it with integers only.
124 NUMBER THEORY
Since x mod m differs from x by a multiple of m, we can understand
congruences in another way:
a
G
b (mod m)
a
-
b is a multiple of m.
(4.36)
For if a mod m = b mod m, then the definition of ‘mod’ in (3.21) tells us
that a
-
b = a mod m + km
-
(b mod m +
Im)
= (k
-
l)m for some integers
k and
1.
Conversely if a
-
b = km, then a = b if m = 0; otherwise
a mod m = a
-
[a/m]m = b + km
-
L(b
+
km)/mjm
=
b-[b/mJm
= bmodm.
The characterization of = in (4.36) is often easier to apply than (4.35). For
example, we have 8
E
23 (mod 5) because 8
-
23 = -15 is a multiple of 5; we
don’t have to compute both 8 mod 5 and 23 mod 5.
The congruence sign
E
looks conveniently like
=
‘,
because
congru-
“I fee/
fine
today
ences
are almost like equations. For example, congruence is an equivalence
modulo a slight
relation; that is, it satisfies the reflexive law ‘a = a’, the symmetric law
headache.”
-
The Hacker’s
‘a
3 b
=$
b
E
a’, and the transitive law ‘a
E
b
E
c
j
a
E
c’.
All these properties are easy to prove, because any relation
‘E’
that satisfies
‘a
E
b
c--J
f(a) = f(b)’ for some function f is an equivalence relation. (In
our case, f(x) = x mod m.) Moreover, we can add and subtract congruent
elements without losing congruence:
Dictionary
12771
a=b
and
c=d
*
a+c
3
b+d
(mod
m)
;
a=b
and
c=d
===+
a-c
z
b-d
(mod m) .
For if a
-
b and c
-
d are both multiples of m, so are (a + c)
-
(b + d) =
(a
-
b) + (c
-
d) and (a
-
c)
-
(b
-
d) = (a -b)
-
(c
-
d). Incidentally, it
isn’t necessary to write ‘(mod m)’ once for every appearance of
E
‘;
if the
modulus is constant, we need to name it only once in order to establish the
context. This is one of the great conveniences of congruence notation.
Multiplication works too, provided that we are dealing with integers:
a
E
b and c = d
I
ac
E
bd
(mod
4,
integers b, c.
Proof: ac
-
bd = (a
-
b)c + b(c
-
d). Repeated application of this multipli-
cation property now allows us to take powers:
a-b
+
a”
E
b”
(mod
ml,
integers a, b;
integer n 3 0.
4.6 ‘MOD’: THE CONGRUENCE RELATION 125
For example, since 2
z
-1 (mod
3),
we have
2n
G
(-1)” (mod 3); this means
that 2”
-
1 is a multiple of 3 if and only if n is even.
Thus, most of the algebraic operations that we customarily do with equa-
tions can also be done with congruences. Most, but not all. The operation
of division is conspicuously absent. If ad
E
bd (mod m), we can’t always
conclude that a
E
b. For example, 3.2
G
5.2 (mod
4),
but 3 8 5.
We can salvage the cancellation property for congruences, however, in
the common case that d and m are relatively prime:
ad=bd
_
a=b
(mod
4,
(4.37)
integers a, b, d, m and d
I
m.
For example, it’s
legit
to conclude from 15 E 35 (mod m) that 3 E 7 (mod m),
unless the modulus m is a multiple of 5.
To prove this property, we use the extended gcd law (4.5) again, finding
d’ and m’ such that d’d + m’m = 1. Then if ad
E
bd we can multiply
both sides of the congruence by d’, obtaining ad’d
E
bd’d. Since d’d
G
1,
we have ad’d
E
a and bd’d
E
b; hence a
G
b. This proof shows that the
number d’ acts almost like l/d when congruences are considered (mod m);
therefore we call it the “inverse of d modulo m!’
Another way to apply division to congruences is to divide the modulus
as well as the other numbers:
ad = bd (modmd)
+=+
a = b (modm),
ford#O.
(4.38)
This law holds for all real a, b, d, and m, because it depends only on the
distributive law (a mod m) d = ad mod md: We have a mod m = b mod m
e
(a mod m)d = (b mod m)d
H
ad mod md = bd mod md. Thus,
for example, from 3.2
G
5.2 (mod 4) we conclude that 3
G
5 (mod 2).
We can combine (4.37) and (4.38) to get a general law that changes the
modulus as little as possible:
ad
E
bd (mod m)
H
a=b
(
mod
m
>
gcd(d,
ml
integers a, b, d, m.
(4.39)
For we can multiply ad
G
bd by d’, where d’d+ m’m = gcd( d, m); this gives
the congruence
a.
gcd( d, m)
z
b. gcd( d, m) (mod m), which can be divided
by
gc44
ml.
Let’s look a bit further into this idea of changing the modulus. If we
know that a 3 b (mod loo), then we also must have a
E
b (mod lo), or
modulo any divisor of 100. It’s stronger to say that a
-
b is a multiple of 100
126 NUMBER THEORY
than to say that it’s a multiple of 10. In general,
a E b (mod md)
j
a = b (mod m) , integer d,
(4.40)
because any multiple of md is a multiple of m.
Conversely, if we know that a
‘=
b with respect to two small moduli, can
Modulitos?
we conclude that a E b with respect to a larger one? Yes; the rule is
a E b (mod m) and a
z
b (mod n)
++
a=b
(mod
lcm(m,
n)) , integers m, n > 0.
(4.41)
For example, if we know that a
z
b modulo 12 and 18, we can safely conclude
that a = b (mod 36). The reason is that if a
-
b is a common multiple of m
and n, it is a multiple of lcm( m, n). This follows from the principle of unique
factorization.
The special case m
I
n of this law is extremely important, because
lcm(m,
n) = mn when m and n are relatively prime. Therefore we will state
it explicitly:
a E b (mod mn)
w
a-b
(mod m) and a = b (mod n), if
min.
(4.42)
For example, a E b (mod 100) if and only if a E b (mod 25) and a E b
(mod 4). Saying this another way, if we know
x
mod 25 and x mod 4, then
we have enough facts to determine x mod 100. This is a special case of the
Chinese Remainder Theorem (see exercise 30), so called because it was
discovered by Sun Tsfi in China, about A.D. 350.
The moduli m and n in (4.42) can be further decomposed into relatively
prime factors until every distinct prime has been isolated. Therefore
a=b(modm)
w
arb(modp”p)
forallp,
if the prime factorization (4.11) of m is
nP
pm”. Congruences modulo powers
of primes are the building blocks for all congruences modulo integers.
4.7 INDEPENDENT RESIDUES
One of the important applications of congruences is a residue num-
ber system, in which an integer x is represented as a sequence of residues (or
remainders) with respect to moduli that are prime to each other:
Res(x) = (x mod ml,. . .
,x
mod m,) , if
mj
I
mk for 1 6 j < k 6 r.
Knowing x mod
ml,
. . . , x mod
m,
doesn’t tell us everything about x. But
it does allow us to determine x mod m, where m is the product
ml
. . . m,.
4.7 INDEPENDENT RESIDUES 127
In practical applications we’ll often know that x lies in a certain range; then
we’ll know everything about x if we know
x
mod m and if m is large enough.
For example, let’s look at a small case of a residue number system that
has only two moduli, 3 and 5:
x mod 15
cmod3 (mod5
0 0 0
1 1
1
2
2
2
3
0
3
4
1
4
5 2
0
6
0
1
7
1
2
8
2
3
9 0
4
10
1
0
11
2
1
12 0
2
13
1
3
14
2
4
Each ordered pair (x mod 3, x mod 5) is different, because x mod 3 = y mod 3
andxmod5=ymod5ifandonlyifxmod15=ymod15.
We can perform addition, subtraction, and multiplication on the two
components independently, because of the rules of congruences. For example,
if we want to multiply 7 =
(1,2)
by 13 =
(1,3)
modulo 15, we calculate
l.lmod3=1and2.3mod5=1.
Theansweris(l,l)=l;hence7.13mod15
must equal 1. Sure enough, it does.
For
example,
the
Mersenne prime
23'-l
works well.
This independence principle is useful in computer applications, because
different components can be worked on separately (for example, by differ-
ent computers).
If
each modulus
mk
is a distinct prime
pk,
chosen to be
slightly less than
23’,
then a computer whose basic arithmetic operations
handle integers in the range
L-2
3’
23’)
can easily compute sums, differences,
,
and products modulo pk. A set of r such primes makes it possible to add,
subtract, and multiply “multiple-precision numbers” of up to almost 31 r bits,
and the residue system makes it possible to do this faster than if such large
numbers were added, subtracted, or multiplied in other ways.
We can even do division, in appropriate circumstances. For example,
suppose we want to compute the exact value of a large determinant of integers.
The result will be an integer D, and bounds on
ID/
can be given based on the
size of its entries. But the only fast ways known for calculating determinants
128 NUMBER THEORY
require division, and this leads to fractions (and loss of accuracy, if we resort
to binary approximations). The remedy is to evaluate D mod
pk
=
Dk,
for
VSIiOUS large primes
pk.
We can safely divide module
pk
unless the divisor
happens to be a multiple of pk. That’s very unlikely, but if it does happen we
can choose another prime. Finally, knowing
Dk
for sufficiently many primes,
we’ll have enough information to determine D.
But we haven’t explained how to get from a given sequence of residues
(x mod
ml,
. . . ,x
mod m,) back to x mod m. We’ve shown that this conver-
sion can be done in principle, but the calculations might be so formidable
that they might rule out the idea in practice. Fortunately, there is a rea-
sonably simple way to do the job, and we can illustrate it in the situation
(x mod 3,x mod 5) shown in our little table. The key idea is to solve the
problem in the two cases
(1,O)
and (0,l); for if
(1,O)
= a and (0,l) = b, then
(x, y) = (ax + by) mod 15, since congruences can be multiplied and added.
In our case a = 10 and b = 6, by inspection of the table; but how could
we find a and b when the moduli are huge? In other words, if m
I
n, what
is a good way to find numbers a and b such that the equations
amodm = 1, amodn = 0, bmodm = 0, bmodn = 1
all hold? Once again, (4.5) comes to the rescue: With Euclid’s algorithm, we
can find m’ and n’ such that
m’m+n’n = 1.
Therefore we can take a = n’n and b = m’m, reducing them both mod mn
if desired.
Further tricks are needed in order to minimize the calculations when the
moduli are large; the details are beyond the scope of this book, but they can
be found in
[174,
page
2741.
Conversion from residues to the corresponding
original numbers is feasible, but it is sufficiently slow that we save total time
only if a sequence of operations can all be done in the residue number system
before converting back.
Let’s firm up these congruence ideas by trying to solve a little problem:
How many solutions are there to the congruence
x2
E
1 (mod m) ,
(4.43)
if we consider two solutions x and x’ to be the same when x = x’?
According to the general principles explained earlier, we should consider
first the case that m is a prime power,
pk,
where k > 0. Then the congruence
x2
= 1 can be written
(x-1)(x+1) = 0 (modpk),
4.7 INDEPENDENT RESIDUES 129
so p must divide either x
-
1 or x + 1, or both. But p can’t divide both
x
-
1 and x + 1 unless p = 2; we’ll leave that case for later. If p > 2, then
pk\(x
-
1)(x + 1)
w
pk\(x
-
1) or pk\(x + 1); so there are exactly two
solutions, x =
+l
and x = -1.
The case p = 2 is a little different. If 2k\(~
-
1 )(x + 1) then either x
-
1
or x + 1 is divisible by 2 but not by 4, so the other one must be divisible
by 2kP’. This means that we have four solutions when k 3 3, namely x = *l
and x = 2k-’
f
1. (For example, when
pk
= 8 the four solutions are x
G
1,
3,
5, 7 (mod 8); it’s often useful to know that the square of any odd integer has
the form 8n + 1.)
All primes are odd
except 2, which is
the oddest of all.
Now x2 = 1 (mod m) if and only if x2 = 1 (mod
pm”
) for all primes p
with mP > 0 in the complete factorization of m. Each prime is independent
of the others, and there are exactly two possibilities for x mod
pm”
except
when p = 2. Therefore if
n
has exactly
r
different prime divisors, the total
number of solutions to x2 = 1 is 2’, except for a correction when m. is even.
The exact number in general is
2~+[8\ml+[4\ml-[Z\ml
(4.44)
For example, there are four “square roots of unity modulo 12,” namely
1,
5,
7, and 11. When m = 15 the four are those whose residues mod 3 and mod 5
are
fl,
namely (1,
l),
(1,4), (2,
l),
and (2,4) in the residue number system.
These solutions are 1, 4,
11,
and 14 in the ordinary (decimal) number system.
4.8
ADDITIONAL APPLICATIONS
There’s some unfinished business left over from Chapter 3: We wish
to prove that the m numbers
Omodm, nmodm,
2nmodm,
. . . . (m-1)nmodm
(4.45)
consist of precisely d copies of the m/d numbers
0,
d, 2d, . . . . m-d
in some order, where d = gcd(m, n). For example, when m = 12 and n = 8
we have d = 4, and the numbers are 0, 8, 4, 0, 8, 4, 0, 8, 4, 0, 8, 4.
The first part of the proof-to show that we get d copies of the first
Mathematicians love
m/d values-is now trivial. We have
to say that things
are trivial.
jn = kn (mod m)
j(n/d)
s
k(n/d) (mod m/d)
by (4.38); hence we get d copies of the values that occur when 0 6 k < m/d.
130 NUMBER THEORY
Now we must show that those m/d numbers are
(0,
d,2d,. . . , m
-
d}
in some order. Let’s write m = m’d and n = n’d. Then kn mod m =
d(kn’ mod m’), by the distributive law (3.23); so the values that occur when
0 6 k < m’ are d times the numbers
0 mod m’, n’ mod m’, 2n’ mod m’, . . . ,
(m’
-
1 )n’ mod m’ .
But we know that m’
I
n’ by (4.27); we’ve divided out their gtd. Therefore
we need only consider the case d =
1,
namely the case that m and n are
relatively prime.
So let’s assume that m
I
n. In this case it’s easy to see that the numbers
(4.45) are just
{O,
1, . . . , m
-
1
}
in some order, by using the “pigeonhole
principle!’ This principle states that if m pigeons are put into m pigeonholes,
there is an empty hole if and only if there’s a hole with more than one pigeon.
(Dirichlet’s box principle, proved in exercise 3.8, is similar.) We know that
the numbers (4.45) are distinct, because
jn
z
kn (mod m)
j
s
k (mod m)
when m
I
n; this is (4.37). Therefore the m different numbers must fill all the
pigeonholes 0,
1,
. . . ,
m
-
1. Therefore the unfinished business of Chapter 3
is finished.
The proof is complete, but we can prove even more if we use a direct
method instead of relying on the indirect pigeonhole argument. If m
I
n and
if a value j
E
[0, m) is given, we can explicitly compute k
E
[O,
m) such that
kn mod m = j by solving the congruence
kn
E
j (mod m)
for k. We simply multiply both sides by n’, where m’m + n’n = 1, to get
k
E
jn’ [mod m)
;
hence k = jn’ mod m.
We can use the facts just proved to establish an important result discov-
ered by Pierre de Fermat in 1640. Fermat was a great mathematician who
contributed to the discovery of calculus and many other parts of mathematics.
He left notebooks containing dozens of theorems stated without proof, and
each of those theorems has subsequently been verified-except one. The one
that remains, now called “Fermat’s Last Theorem,” states that
a” + b” #
c”
(4.46)
4.8 ADDITIONAL APPLICATIONS 131
(NEWSFLASH]
Euler
1931
con-
jectured that
a4
+
b4
+
c4
#
d4,
but
Noam
Elkies
found infinitely
many solutions in
August, 1987.
Now Roger Frye has
done an exhaustive
computer search,
proving (aRer about
I19 hours on a Con-
nection Machine)
that the smallest
solution is:
958004
+2175194
+4145604
=
4224814.
‘I.
laquelfe
propo-
sition, si
efle
est
vraie, est de
t&s
grand usage.”
-P. de Fermat
1971
for all positive integers a, b, c, and n, when n > 2. (Of course there are lots
of solutions to the equations a + b = c and
a2
+
b2
=
c2.)
This conjecture
has been verified for all n 6 150000 by Tanner and
Wagstaff
[285].
Fermat’s theorem of 1640 is one of the many that turned out to be prov-
able. It’s now called Fermat’s Little Theorem (or just Fermat’s theorem, for
short), and it states that
np-’ = 1 (modp),
ifnIp.
(4.47)
Proof: As usual, we assume that p denotes a prime. We know that the
p-l
numbersnmodp,2nmodp,
. . . . (p
-
1 )n mod p are the numbers
1,
2,
.“,
p
-
1 in some order. Therefore if we multiply them together we get
n.
(2n). . . . . ((p
-
1)n)
E
(n mod p) . (2n mod p) . . . . . ((p
-
1)n mod p)
5
(p-l)!,
where the congruence is modulo p. This means that
(p
-
l)!nP-’ =
(p-l)!
(modp),
and we can cancel the (p
-
l)!
since it’s not divisible by p. QED.
An alternative form of Fermat’s theorem is sometimes more convenient:
np
= n
-
(mod
P
) ,
integer n.
(4.48)
This congruence holds for all integers n. The proof is easy: If n
I
p we
simply multiply (4.47) by n. If not, p\n, so
np
3 0
=_
n.
In the same year that he discovered
(4.47),
Fermat wrote a letter to
Mersenne, saying he suspected that the number
f,
=
22"
+l
would turn out to be prime for all n 3 0. He knew that the first five cases
gave primes:
2'+1
= 3;
2'+1
= 5;
24+1
= 17;
28+1
= 257;
216+1
= 65537;
but he couldn’t see how to prove that the next case,
232
+ 1
=
4294967297,
would be prime.
It’s interesting to note that Fermat could have proved that
232
+ 1 is not
prime, using his own recently discovered theorem, if he had taken time to
perform a few dozen multiplications: We can set n = 3 in
(4.47),
deducing
that
p3’
E
1
(mod
232
+ l), if
232
+ 1 is prime.
132 NUMBER THEORY
And it’s possible to test
this,
relation by hand, beginning with 3 and squaring
32 times, keeping only the remainders mod
232
+ 1. First we have
32
= 9,
If
this is
Fermat’s
then
32;’
= 81, then
323
= 6561, and so on until we reach
32"
s
3029026160
(mod
232
+ 1) .
Little Theorem,
the other one was
last
but not least.
The result isn’t 1, so
232
+ 1 isn’t prime. This method of disproof gives us
no clue about what the factors might be, but it does prove that factors exist.
(They are 641 and 6700417.)
If
3232
had turned out to be
1,
modulo
232
+ 1, the calculation wouldn’t
have proved that
232
+ 1 is prime; it just wouldn’t have disproved it. But
exercise 47 discusses a converse to Fermat’s theorem by which we can prove
that large prime numbers are prime, without doing an enormous amount of
laborious arithmetic.
We proved Fermat’s theorem by cancelling (p
-
1 )! from both sides of a
congruence. It turns out that (p
-
I)! is always congruent to -1, modulo p;
this is part of a classical result known as Wilson’s theorem:
(n-- I)! 3 -1 (mod n) n is prime,
ifn>l.
(4.49)
One half of this theorem is trivial: If n > 1 is not prime, it has a prime
divisor p that appears as a factor of (n
-
l)!, so (n
-
l)!
cannot be congruent
to -1. (If (n- 1 )! were congruent to -1 modulo n, it would also be congruent
to -1 modulo p, but it isn’t.)
The other half of Wilso’n’s theorem states that (p
-
l)!
E
-1 (mod p).
We can prove this half by p,airing up numbers with their inverses mod p. If
n
I
p, we know that there exists n’ such that
n’n
+i
1
(mod
P);
here n’ is the inverse of n, and n is also the inverse of n’. Any two inverses
of n must be congruent to each other, since nn’ E nn” implies n’
c
n”.
ff
p is
prime, is
p'
Now suppose we pair up each number between 1 and p-l with its inverse.
prime
prime?
Since the product of a number and its inverse is congruent to
1,
the product
of all the numbers in all pairs of inverses is also congruent to 1; so it seems
that (p -- l)! is congruent to 1. Let’s check, say for p = 5. We get
4!
= 24;
but this is congruent to 4, not
1,
modulo 5. Oops- what went wrong? Let’s
take a closer look at the inverses:
1’
:=
1)
2' = 3, 3' = 2,
4' = 4.
Ah so; 2 and 3 pair up but 1 and 4 don’t-they’re their own inverses.
To resurrect our analysis we must determine which numbers are their
own inverses. If x is its own inverse, then
x2
= 1 (mod p); and we have
4.8 ADDITIONAL APPLICATIONS 133
“5
fuerit
N ad x
numerus
primus
et n numerus
partium ad N
primarum,
turn
potestas
xn
unitate
minuta
semper per
numerum
N
erit
divisibilis.”
-L.
Euler
[89]
already proved that this congruence has exactly two roots when p > 2. (If
p = 2 it’s obvious that (p
-
l)!
= -1, so we needn’t worry about that case.)
The roots are
1
and
p
-
1,
and the other numbers (between
1
and
p
-
1)
pair
up; hence
(p-l)!
E
l.(p-1)
=
-1,
as desired.
Unfortunately, we can’t compute factorials efficiently, so Wilson’s theo-
rem is of no use as a practical test for primality. It’s just a theorem.
4.9 PHI AND MU
How many of the integers
(0,
1, . . . , m-l} are relatively prime to m?
This is an important quantity called cp(m), the “totient” of m (so named by
J. J. Sylvester
[284],
a British mathematician who liked to invent new words).
We have
q(l)
= 1,
q(p)
= p
-
1, and cp(m) < m- 1 for all composite
numbers m.
The
cp
function is called Euler’s totient j’unction, because Euler was the
first person to study it. Euler discovered, for example, that Fermat’s theorem
(4.47) can be generalized to nonprime moduli in the following way:
nVp(m)
= 1 (mod m) ,
ifnIm.
(4.50)
(Exercise 32 asks for a proof of Euler’s theorem.)
If m is a prime power
pk,
it’s easy to compute cp(m), because n
I
pk
H
p%n. The multiples of
p
in
{O,l,...,pk
-l} are
{0,p,2p,...,pk
-p};
hence
there are
pk-'
of them, and
cp(pk)
counts what is left:
cp(pk)
= pk
-
pk-’
Notice that this formula properly gives
q(p)
= p
-
1
when k =
1.
If m > 1 is not a prime power, we can write m = ml rn2 where ml
I
m2.
Then the numbers 0 6 n < m can be represented in a residue number system
as (n mod
ml,
n mod ml). We have
nlm
nmodml
I
ml and nmod ml
I
rn2
by (4.30) and (4.4). Hence, n mod m is “good” if and only if n mod ml
and n mod rn2 are both “good,”
if we consider relative primality to be a
virtue. The total number of good values modulo m can now be computed,
recursively: It is q(rnl )cp(mz), because there are cp(ml ) good ways to choose
the first component n mod ml and cp(m2) good ways to choose the second
component n mod rn2 in the residue representation.
134 NUMBER THEORY
For example, (~(12) =
cp(4)(p(3)
= 292 = 4, because n is prime to 12 if
“Sisint
A et B
nu-
and only if n mod 4 = (1 or
3)
and n mod 3 = (1 or 2). The four values prime
meri inter se primi
to 12 are
(l,l),
(1,2),
(3,111,
(3,2) in the residue number system; they are
et numerus partium
1, 5, 7, 11 in ordinary decimal notation. Euler’s theorem states that
n4
3 1
ad
A
primarum
sjt
= a,
numerus
(mod 12) whenever n
I
12.
vero partium ad B
A function f(m) of positive integers is called
mult$icative
if f (1) = 1 ~~f~u~e$
raz’
and
tium ad productum
AB
primarum
erit
f(mlm2)
=
f(m)f(m2)
whenever
ml
I
mz.
(4’5l) =
“‘:L.
Euler
[#J]
We have just proved that q)(m) is multiplicative. We’ve also seen another
instance of a multiplicative function earlier in this chapter: The number of
incongruent solutions to x = 1 (mod m) is multiplicative. Still another_
example is f(m) =
ma
for any power
01.
A multiplicative function is defined completely by its values at prime
powers, because we can decompose any positive integer m into its prime-
power factors, which are relatively prime to each other. The general formula
f(m)
=
nf(pmpl,
if m=
rI
pmP
(4.52)
P P
holds if and only if f is multiplicative.
In particular, this formula gives us the value of Euler’s totient function
for general m:
q(m)
=
n(p”p
-pm,-‘)
=
mn(l
-J-).
P\m P\m
r
For example, (~(12) =
(4-2)(3-
1) =
12(1
-
i)(l
-
5).
Now let’s look at an application of the
cp
function to the study of rational
numbers mod 1. We say that the fraction m/n is basic if 0 6 m < n. There-
fore q(n) is the number of reduced basic fractions with denominator n; and
the Farey series 3,, contains all the reduced basic fractions with denominator
n or less, as well as the non-basic fraction
f.
The set of all basic fractions with denominator 12, before reduction to
lowest terms, is
Reduction yields
4.9 PHI AND MU 135
and
we
can
group
these
fractions
by
their denominators:
What
can
we
make
of
this?
Well,
every
divisor
d
of
12
occurs
as
a
denomi-
nator, together with all
cp(d)
of
its
numerators.
The
only denominators that
occur
are
divisors
of
12.
Thus
dl)
+
(~(2)
+
(~(3)
+
(~(4)
+
(~(6)
+
(~(12)
=
12.
A
similar thing will obviously happen if
we
begin with the unreduced
fractions
0 1
rn,
;;;I . . . .
y
for
any
m,
hence
xv(d)
=
m.
d\m
(4.54)
We
said near the beginning
of
this chapter that
problems
in number
theory
often
require
sums
over
the
divisors
of
a
number.
Well,
(4.54)
is
one
such
sum,
so
our
claim is vindicated.
(We
will
see
other
examples.)
Now
here’s
a
curious
fact:
If
f
is
any
function
such
that the
sum
g(m)
=
x+(d)
d\m
is multiplicative, then
f
itself is multiplicative. (This result, together with
(4.54)
and the
fact
that
g(m)
= m is obviously multiplicative,
gives
another
reason
why
cp(m)
is multiplicative.)
We
can
prove
this
curious
fact
by
in-
duction
on
m:
The
basis is
easy
because
f
(1)
=
g
(1)
=
1.
Let
m >
1,
and
assume
that
f
(ml
m2)
=
f
(ml
)
f
(mz)
whenever
ml
I
mz
and
ml
mz
<
m.
If
m=mlmz
andml
Imz,wehave
g(mlm)
=
t
f(d)
=
t
x
f(dldz),
d\ml
m2
dl\ml
dz\mz
and
dl
I
d2
since
all
divisors
of
ml
are
relatively prime
to
all
divisors
of
ml. By the induction
hypothesis,
f
(dl
d2)
=
f
(dl
)
f
(dr
)
except
possibly
when
dl
=
ml
and
d2
=
m2;
hence
we
obtain
(
t
f(dl)
t
f(b))
-
f(m)f(w)
+
f(mmz)
dl
\ml dz\m
=
s(ml)s(mz)
-f(ml)f(m2)
+f(mm2).
But this equals g(mlmr) = g(ml)g(mz),
so
f(mlm2) = f(ml)f(mr).
136 NUMBER THEORY
Conversely, if f(m) is multiplicative, the corresponding sum-over-divisors
function g(m) =
td,m
f(d) is always multiplicative. In fact, exercise 33 shows
that even more is true. Hence the curious fact is a fact.
The Miibius
finction
F(m),
named after the nineteenth-century mathe-
matician August Mobius who also had a famous band, is defined for all m 3 1
by the equation
x
p(d)
=
[m=l].
d\m
(4.55)
This equation is actually a recurrence, since the left-hand side is a sum con-
sisting of p(m) and certain values of p(d) with d < m. For example, if we
plug in m =
1,
2, .
. .
, 12 successively
we
can compute
the
first twelve values:
n 12
3
4 5 6
7
8
910
11
12
cl(n)
1 -1
-1
0 -1 1
-1
0
0 1
-1 0
Mobius came up with the recurrence formula (4.55) because he noticed
that it corresponds to the following important “inversion principle”:
g(m)
=
xf(d)
d\m
f(m)
=
x~(d)g(T)
I
d\m
(4.56)
According to this principle, the
w
function gives us a new way to understand
any function f(m) for which we know
Ed,,,,
f(d).
Now is a
good time
The proof of (4.56) uses two tricks (4.7) and (4.9) that we described near
to
try
WamW
the beginning of this chapter: If g(m) =
td,m
f(d) then
exercise 11.
g(d)
t
f(k)
k\d
k\m
d\Cm/k)
=
t
[m/k=llf(k)
= f(m).
k\m
The other half of (4.56) is proved similarly (see exercise 12).
Relation (4.56) gives us a useful property of the Mobius function, and we
have tabulated the first twelve values; but what is the value of p(m) when
4.9 PHI AND MU 137
m is large? How can we solve the recurrence (4.55)? Well, the function
g(m) = [m =
11
is obviously multiplicative-after all, it’s zero except when
m = 1. So the Mobius function defined by (4.55) must be multiplicative, by
Depending on bow
what we proved a minute or two ago. Therefore we can figure out what
k(m)
fast you read.
is if we compute p(pk).
When m = pk, (4.55) says that
cl(l)+CL(P)+CL(P2)+...+CL(Pk)
= 0
for all k 3
1,
since the divisors of
pk
are 1, . . . , pk. It follows that
cl(P)
= -1;
p(pk)
= 0 for k > 1.
Therefore by (4.52), we have the general formula
ifm=pjpz...p,;
if m is divisible by some p2.
(4.57)
That’s
F.
If we regard (4.54) as a recurrence for the function
q(m),
we can solve
that recurrence by applying Mobius’s rule (4.56). The resulting solution is
v(m)
=
t
Ad):.
d\m
(4.58)
For example,
(~(14 = ~(1)~12+~~(2)~6+~(3)~4+~(4)~3+~(6)~2+~(12)~1
=12-6-4+0+2+0=4.
If m is divisible by
r
different primes, say
{p,
, . . . , p,},
the sum (4.58) has only
2’ nonzero terms, because the
CL
function is often zero. Thus we can see that
(4.58) checks with formula (4.53), which reads
cp(m)
= m(l
-
J-) . . .
(I-
J-)
;
if we multiply out the
r
factors (1
-
1 /pi), we get precisely the 2’ nonzero
terms of (4.58). The advantage of the Mobius function is that it applies in
many situations besides this one.
For example, let’s try to figure out how many fractions are in the Farey
series 3n. This is the number of reduced fractions in
[O,
l]
whose denominators
do not exceed n, so it is 1 greater than O(n) where we define
Q(x) =
x
v(k).
l<k<x
(4.59)
138 NUMBER THEORY
(We must add 1 to O(n) because of the final fraction $.) The sum in (4.59)
looks difficult, but we can determine m(x) indirectly by observing that
(4.60)
for all real x 3 0. Why does this identity hold? Well, it’s a bit awesome yet
not really beyond our ken. There are
5
Lx]11
+
x]
basic fractions m/n with
0 6 m < n < x, counting both reduced and unreduced fractions; that gives
us the right-hand side. The number of such fractions with gcd(m,n) = d
is @(x/d), because such fractions are
m//n’
with 0 < m’ < n’ 6 x/d after
replacing m by m’d and n by n’d. So the left-hand side counts the same
fractions in a different way, and the identity must be true.
Let’s look more closely at the situation, so that equations (4.59) and
(4.60) become clearer. The definition of m(x) implies that m,(x) =
@(lx]);
but it turns out to be convenient to define m,(x) for arbitrary real values, not
(This extension to
just for integers. At integer values we have the table
real values is a use-
ful trick for many
n 0 12 3 4 5 6 7 8 9 10 11 12
recurrences that
arise in the analysis
v(n)
-112 2 4 2 6 4 6 4 10 4
of algorithms.)
o(n)
0 1 2 4 6 10 12 18 22 28 32 42 46
and we can check (4.60) when x = 12:
@,(12)
+
D,(6)
+@(4)
f@(3)
+ O(2) +
m,(2)
+6.@,(l)
=
46+12+6+4+2+2+6
= 78 = t.12.13.
Amazing.
Identity (4.60) can be regarded as an implicit recurrence for
0(x);
for
example, we’ve just seen that we could have used it to calculate
CD
(12) from
certain values of
D(m)
with m < 12. And we can solve such recurrences by
using another beautiful property of the Mobius function:
g(x)
=
x
f(x/d)
da1
tr’
(4.61)
This inversion law holds for all functions f such that
tk,da,
If(x/kd)I < 00;
we can prove it as follows. Suppose g(x) =
td3,
f(x/d).
Then
t
Ad)g(x/d)
=
x
Ad)
x
f(x/kd)
d>l
d>l k>l
=
x
f(x/m)
x
vL(d)[m=kdl
lTt>l
d,kal
4.9 PHI AND MU 139
=
x
f(x/m)
x
p(d)
=
x
f(x/m)[m=l]
= f(x).
m>l
d\m
lll>l
The proof in the other direction is essentially the same.
So now we can solve the recurrence (4.60) for
a(x):
D,(x)
=
;
x
Ad)
lx/d.lll
+
x/d1
d>l
(4.62)
This is always a finite sum. For example,
Q(12)
=
;(12.13-6.7-4.5+0-2.3+2.3
-1~2+0+0+1~2-1~2+0)
ZI
78-21-10-3+3-1+1-l
= 46.
In Chapter 9 we’ll see how to use (4.62) to get a good approximation to
m(x);
in fact, we’ll prove that
Q(x)
= -$x2 + O(xlogx).
Therefore the function
O(x)
grows “smoothly”; it averages out the erratic
behavior of
cp(k).
In keeping with the tradition established last chapter, let’s conclude this
chapter with a problem that illustrates much of what we’ve just seen and that
also points ahead to the next chapter. Suppose we have beads of n different
colors; our goal is to count how many different ways there are to string them
into circular necklaces of length m. We can try to “name and conquer” this
problem by calling the number of possible necklaces N (m, n).
For example, with two colors of beads R and B, we can make necklaces
of length 4 in N
(4,2)
= 6 different ways:
f-R\
/R\ fR\
c-R\
/R-\
c-B>
RR RR RB BB BB BB
<R’ <B’
LB’
<R’
LBJ
cBJ
All other ways are equivalent to one of these, because rotations of a necklace
do not change it. However, reflections are considered to be different; in the
case m = 6, for example,
/B-J
f-B>
R R R R
k
li
is different from
<BJ
140 NUMBER THEORY
The problem of counting these configurations was first solved by P. A.
Mac-
Mahon in 1892
[212].
There’s no obvious recurrence for N (m, n), but we can count the neck-
laces by breaking them each into linear strings in m ways and considering the
resulting fragments. For example, when m = 4 and n = 2 we get
RRRR
RRRR RRRR RRRR
RRBR RRRB BRRR RBRR
RBBR
RRBB BRRB BBRR
RBRB BRBR RBRB
BRBR
RBBB
BRBB
BBRB BBBR
BBBB BBBB BBBB
BBBB
Each of the
nm
possible patterns appears at least once in this array of
mN(m,n) strings, and some patterns appear more than once. How many
times does a pattern
a~.
. .
a,,-, appear? That’s easy: It’s the number of
cyclic shifts ok . . .
a,-,
a0 . . . ok-1
that produce the same pattern as the orig-
inal a0 . . .
a,-,
.
For example, BRBR occurs twice, because the four ways to
cut the necklace formed from BRBR produce four cyclic shifts (BRBR, RBRB,
BRBR, RBRB); two of these coincide with BRBR itself. This argument shows
that
mN(m,n) =
t
x
[ao...a,_l
=ak...amplaO...ak-l]
q,,...,a,e,ES,
O$k<m
=
x
x
[a0 . .
.a,-,
=ak..
.
am-lao..
. ak-l] .
O$k<m
ao,...,a,-,ES,
Here S, is a set of n different colors.
Let’s see how many patterns satisfy a0 . . .
a,-1
= ok. . .
a,-,
a0 . . .
ok-l,
when k is given. For example, if m = 12 and k = 8, we want to count the
number of solutions to
This means
a0
=
og
=
a4;
al =
a9
= as;
a2
= alo =
o6;
and
a3
=
all
=
a7.
So the values of ao,
al,
a2, and
as
can be chosen in n4 ways, and the remaining
a’s depend on them. Does this look familiar? In general, the solution to
ai
=
%+k)modm
I
for 0 < j < m
makes US equate
oi
with
o(i+kr)
modm
for 1 = 1, 2, . .
.;
and we know that
the multiples of k modulo m are
(0,
d, 2d,. . . , m
-
d}, where d = gcd(k, m).
Therefore the general solution is to choose
ao,
. . . , o&l independently and
then to set
oj
=
oj+d
for d < j < m. There are nd solutions.
4.9 PHI AND MU 141
We have just proved that
mN(m,n) =
x
ngcdCkVm)
.
O<k<m
This sum can be simplified, since it includes only terms nd where d\m. Sub-
stituting d = gcd(k, m) yields
N(m,n) =
tx
nd
x
[d=gcd(k,m)]
d\m
O<k<m
=
t
x
nd~
x
[k/d.l m/d]
d\m
O<k<m
=
i-
nd
t
[kIm/d].
d\m
O<k<m/d
(We are allowed to replace k/d by k because k must be a multiple of d.)
Finally, we have
&‘,,,,,/d [klm/d]
=
cp(m/d)
by definition, so we obtain
MacMahon’s
formula:
N(m,n) =
ix
d,mndg(T)
=
ixdd)nm/d
d\m
(4.63)
When m = 4 and n = 2, for example, the number of necklaces is
i
(1
.24
+
1
.22
+ 2.2’) = 6, just as we suspected.
It’s not immediately obvious that the value N(m, n) defined by
Mac-
Mahon’s sum is an integer! Let’s try to prove directly that
x
cp(d)nm’d
G
0
d\m
(mod m),
(4.64)
without using the clue that this is related to necklaces. In the special case
that m is prime, this congruence reduces to n” + (p
-
1)n = 0 (mod p); that
is, it reduces to np = n. We’ve seen in (4.48) that this congruence is an
alternative form of Fermat’s theorem. Therefore (4.64) holds when m = p;
we can regard it as a generalization of Fermat’s theorem to the case when the
modulus is not prime. (Euler’s generalization (4.50) is different.)
We’ve proved (4.64) for all prime moduli, so let’s look at the smallest
case left, m = 4. We must prove that
n4+n2+2n
= 0
(mod 4) .
The proof is easy if we consider even and odd cases separately. If n is even,
all three terms on the left are congruent to 0 modulo 4, so their sum is too. If
142 NUMBER THEORY
n is odd,
n4
and
n2
are each congruent to 1, and 2n is congruent to 2; hence
the left side is congruent to I + 1
+2
and thus to 0 modulo 4, and we’re done.
Next, let’s be a bit daring and try m = 12. This value of m ought to
be interesting because it has lots of factors, including the square of a prime,
yet it is fairly small. (Also there’s a good chance we’ll be able to generalize a
proof for 12 to a proof for general m.) The congruence we must prove is
n”+n6+2n4+2n3+2n2+4n
E 0 (mod 12).
Now what? By (4.42) this congruence holds if and only if it also holds mod-
ulo 3 and modulo 4. So let’s prove that it holds modulo 3. Our congru-
ence (4.64) holds for primes, so we have
n3
+ 2n = 0 (mod 3). Careful
scrutiny reveals that we can use this fact to group terms of the larger sum:
n’2+n6+2n4+2n3+2n2+4n
=
(n12
+2n4) +
In6
+2n2)
+2(n3
+2n)
e
0+0+2*0
5 0
(mod
3).
So it works modulo 3.
We’re half done. To prove congruence modulo 4 we use the same trick.
We’ve proved that
n4
+n2
+2n
= 0 (mod 4), so we use this pattern to group:
n”+n6+2n4+2n3+2n2+4n
=
(n12
+
n6
+ 2n3) + 2(n4 +
n2
+ 2n)
E
0+2.0
E 0 (mod 4).
QED for the case m = 12.
QED: Quite
Easily
So far we’ve proved our congruence for prime m, for m = 4, and for m =
Done.
12. Now let’s try to prove it for prime powers. For concreteness we may
suppose that m =
p3
for some prime p. Then the left side of (4.64) is
np3
+
cp(p)nP2
+ q(p2)nP + cp(p3)n
=
np3
+ (p
-
1
)np2
+
(p2
-
p)nP +
(p3
-
p2)n
=
(np3
-
npz) +
p(np2
-
nP) + p2(nP
-n)
+p3n.
We can show that this is congruent to 0 modulo
p3
if we can prove that
n’J3
-
nP2
is divisible by
p3,
that
nP2
-
n
P
is divisible by
p2,
and that n”
-
n
is divisible by p, because the whole thing will then be divisible by
p3.
By the
alternative form of Fermat’s theorem we have
np
E n (mod p), so p divides
np
-
n; hence there is an integer q such that
np
=
nfpq
4.9 PHI AND MU 143
Now we raise both sides to the pth power, expand the right side according to
the binomial theorem (which we’ll meet in Chapter 5), and regroup, giving
TIP2
=
(n
+ pq)p =
np
+ (pq)‘nPm’
y
+ (pq)2nPP2
i
+
0 0
=
np
+ p2Q
for some other integer Q. We’re able to pull out a factor of
p2
here because
($
= p in the second term, and because a factor of (pq)’ appears in all the
terms that follow. So we find that
p2
divides
npz
-
np.
Again we raise both sides to the pth power, expand, and regroup, to get
np3
=
(nP
+ P~Q)~
=
nP2 +
(p2Q)‘nP’Pp’l
y
+
(p2Q)2nP’P-2’
1
+ . .
0
0
=
np2
+ p3Q
for yet another integer Q. So
p3
divides
nP3-
np’. This finishes the proof for
m = p3, because we’ve shown that
p3
divides the left-hand side of (4.64).
Moreover we can prove by induction that
n~k
=
n~km’ +
pkD
for some final integer
rl
(final because we’re running out of fonts); hence
nPk
E
nPk-’
(mod
~~1,
for
k
> 0.
(4.65)
Thus the left side of (4.64), which is
(n
Pk-nPkm’)
+
p(nPkm’-nPkmZ)
+ . . .
+
pkpl(nP-,)
+
pkn,
is divisible by
pk
and so is congruent to 0 modulo
pk.
We’re almost there. Now that we’ve proved (4.64) for prime powers, all
that remains is to prove it when m = m’ m2, where m’
I
ml,
assuming that
the congruence is true for m’ and m2. Our examination of the case m = 12,
which factored into instances of m = 3 and m = 4, encourages us to think
that this approach will work.
We know that the
cp
function is multiplicative, so we can write
x
q(d)nm’d
=
x
(P(d’d2)nm1mz’d1d2
d\m
dl
\ml>
dr\mz
=
t
oldl)(
x
di\ml
dz\mz
144 NUMBER THEORY
But the inner sum is congruent to 0 modulo mz, because we’ve assumed that
(4.64) holds for ml; so the entire sum is congruent to 0 modulo
m2.
By a
symmetric argument, we find that the entire sum is congruent to 0 modulo
ml
as well. Thus by (4.42) it’s ‘congruent to 0 modulo m. QED.
Exercises
Warmups
1
What is the smallest positive integer that has exactly k divisors, for
l<k$6?
2
Prove that gcd( m, n) . lcm( m, n) =
m.n,
and use this identity to express
lcm(m,n) in terms of
lc.m(n
mod m, m), when n mod m # 0. Hint: Use
(4.121,
(4.14))
and
(4.15).
3
Let
71(x)
be the number of primes not exceeding x. Prove or disprove:
n(x)
-
X(X
-
1) =
[x
is prime]
4 What would happen if the Stern-Brocot construction started with the
five fractions
(p,
$,
$,
2,
e) instead of with (f,
$)?
5
Find simple formulas for
Lk
and
Rk,
when L and R are the 2 x 2 matrices
of (4.33).
6
What does ‘a = b (mod 0)’ mean?
7
Ten people numbered
1
to 10 are lined up in a circle as in the
Josephus
problem, and every mth person is executed. (The value of m may be
much larger than 10.) Prove that the first three people to go cannot be
10, k, and
k+
1 (in this order), for any k.
8 The residue number system (x mod 3, x mod 5) considered in the text has
the curious property that 13 corresponds to
(1,3),
which looks almost the
same. Explain how to find all instances of such a coincidence, without
calculating all fifteen pairs of residues. In other words, find all solutions
to the congruences
lOx+y G x (mod3), lOx+y E y (mod5).
Hint: Use the facts that lOu+6v = u (mod 3) and lOu+6v = v (mod 5).
9 Show that (3”
-
1)/2
is odd and composite. Hint: What is
3”
mod
4?
10 Compute
(~(999).
4 EXERCISES 145
11
Find a function o(n) with the property that
g(n) =
t
f(k)
M
f(n) =
x
o(k)g(n-k).
O<k<n O<k<n
(This is analogous to the Mobius function; see (4.56).)
12 Simplify the formula
xd,,,,
tkjd
F(k) g(d/k).
13 A positive integer n is called squarefree if it is not divisible by m2 for
any m > 1. Find a necessary and sufficient condition that n is squarefree,
a
in terms of the prime-exponent representation (4.11) of n;
b in terms of u(n).
Basics
14 Prove or disprove:
a
gcd(km, kn) = kgcd(m,n) ;
b lcm(km, kn) =
klcm(m,n)
.
15 Does every prime occur as a factor of some Euclid number e,?
16 What is the sum of the reciprocals of the first n Euclid numbers?
17 Let
f,
be the “Fermat number”
22”
+ 1. Prove that
f,
I
f,
if m < n.
18 Show that if 2” + 1 is prime then n is a power of 2.
19
For every positive integer n there’s a prime p such that n < p 6 2n. (This
is essentially “Bertrand’s postulate,” which Joseph Bertrand verified for
n < 3000000 in 1845 and Chebyshev proved for all n in 1850.) Use
Bertrand’s postulate to prove that there’s a constant b
z
1.25 such that
the numbers
129,
1227,
[2q
. . .
are all prime.
20 Let
P,
be the nth prime number. Find a constant K such that
[(10n2K)
mod
10n]
= P,.
21 Prove the following identities when n is a positive integer:
Hint: This is a trick question and the answer is pretty easy.
146 NUMBER THEORY
22 The number 1111111111111111111 is prime. Prove that, in any radix b,
Is this a test for
(11 . . . 1
)b
can be prime only if the number of 1
‘s
is prime.
strabismus?
23
State a recurrence for p(k), the ruler function in the text’s discussion of
ez(n!).
Show that there’s a connection between p(k) and the disk that’s
moved at step k when an n-disk Tower of Hanoi is being transferred in
2"
-
1 moves, for 1
<
k
6
2n
-
1.
24 Express e,(n!) in terms of y,,(n), the sum of the digits in the radix p
Look, ma,
representation of n, thereby generaliZing
(4.24).
sideways addition.
25
We say that m esactly divides n, written m\\n, if m\n and m J- n/m.
For example, in the text’s discussion of factorial factors,
p”P(“!)\\n!.
Prove or disprove the following:
a
k\\n and m\\n
++
km\\n, if k
I
m.
b For all m,n > 0, either gcd(m, n)\\m or gcd(m, n)\\n.
26
Consider the sequence
I&
of all nonnegative reduced fractions m/n such
that mn 6 N For example,
cJIO
=
0
11111111
z
1
z
i
3
2
5
3
4
s
6
z s
9
lo
1'10'9'8'7'b'5'4'3'5'2'3'1'2'1'2'1'2'1'1'~'1'1'1'1'
1
Is it true that m’n
-
mn’ = 1 whenever m/n immediately precedes
m//n’ in
$Y!N?
27 Give a simple rule for
c:omparing
rational numbers based on their repre-
sentations as L’s and R’s in the Stern-Brocot number system.
28 The Stern-Brocot representation of
7[
is
rr
=
R3L7R’5LR29i’LRLR2LR3LR14L2R,.
.
;
use it to
find
all the simplest rational approximations to
rc
whose denom-
inators are less than 50. Is
y
one of them?
29 The text describes a correspondence between binary real numbers
x
=
(.blb2b3..
.
)2
in [0, 1) and Stern-Brocot real numbers
o(
=
B1
B2B3
. . . in
[O,
00).
If x corresponds to
01
and
x
# 0, what number corresponds to
l--x?
30 Prove the following statement (the Chinese Remainder Theorem): Let
ml,
. . . .
m, be integers with
mj
I
mk
for 1 6 j < k < r; let m =
ml
. . .
m,; and let
al,
. . . .
arr
A be integers. Then there is exactly one
integer a such that
a=ak(modmk)fOrl<k<r
and
A<a<A+m.
31
A number in decimal notation is divisible by 3 if and only if the sum of
its digits is divisible by 3. Prove this well-known rule, and generalize it.
4 EXERCISES 147
Why is “Euler”
32 Prove Euler’s theorem (4.50) by generalizing the proof of (4.47).
pronounced “Oiler”
when “Euclid” is
33 Show that if f(m) and g(m) are multiplicative functions, then so is
“Yooklid”?
h(m) =
tdim
f(d)
g(m/d).
34
Prove that (4.56) is a special case of (4.61).
Homework exercises
35
Let I(m,n) be a function that satisfies the relation
I(m,n)m+ I(n,m)n = gcd(m,n),
when m and n are nonnegative integers with m # n. Thus, I(
m,
n) = m’
and
I(n,
m) = n’ in (4.5); the value of
I(m,
n) is an inverse of m with
respect to n. Find a recurrence that defines I(m,n).
36 Consider the set
Z(m)
= {m +
n&?
1 integer m,n}. The number
m +
no
is called a unit if
m2
-
1
On2
=
f
1,
since it has an inverse
(that is, since
(m+nm).+(m-n&?)
= 1). For example,
3+mis
a unit, and so is 19
-
6m.
Pairs of cancelling units can be inserted into
any factorization, so we ignore them. Nonunit numbers of
Z(m
)
are
called prime if they cannot be written as a product of two nonunits. Show
that2,3,and4fnareprimesofZ(fl).
Hint:
If2=(k+L&?)x
(m +
n&?
) then 4 =
(kz
-
1
012)
(
mz
-
1
On’).
Furthermore, the square
of any integer mod 10 is 0, 1, 4, 5, 6, or 9.
37 Prove (4.17). Hint: Show that
e,
-
i = (e,_l
-
i)’
+
$,
and consider
2-nlog(e,
-
t).
38
Prove that if a
I
b and a > b then
gcd(am _
bm,
an
_ bn) =
agcd(m>n)
_ bdm>ni
,
O$m<n.
(All variables are integers.) Hint: Use Euclid’s algorithm.
39 Let S(m) be the smallest positive integer n for which there exists an
increasing sequence of integers
m =
a1
< a2 < ... < at = n
such that al
al..
. at is a perfect square. (If m is a perfect square, we
can let t = 1 and n =
m.)
For example, S(2) = 6 because the best such
sequence is 2.3.6. We have
n 12345
6 7 8 9 10
11
12
S(n)
1 6 8 4 10
12 14 15 9 18
22 20
Prove that S(m) #
S
(m’) whenever 0 < m < m’.
148 NUMBER THEORY
40 If the radix p representation of n is (a,,, . . . al ao)v, prove that
Wp
epCn!)
E
(-l)“P(n!‘a,!.
. . a,! ao! (mod p)
(The left side is simply n! with all p factors removed. When n = p this
reduces to Wilson’s theorem.)
41
42
a
Show that if p mod. 4 = 3, there is no integer n such that p divides
n* + 1. Hint: Use :Fermat’s theorem.
b But show that if p mod 4 =
1,
there is such an integer. Hint: Write
(P
-
I)!
as
(II,=,
‘p~‘i’2 k(p
-
k)) and think about Wilson’s theorem.
Consider two fractions
m/n
and m//n’ in lowest terms. Prove that when
the sum
m/n+m’/n’
is reduced to lowest terms, the denominator will be
nn’ if and only if n
I
n’. (In other words, (mn’+m’n)/nn’ will already
be in lowest terms if and only if n and n’ have no common factor.)
43 There are
2k
nodes at level k of the Stern-Brocot tree, corresponding to
the matrices Lk Lkp’ R
..I
Rk.
Show that this sequence can be obtained
by starting with Lk and’then multiplying successively by
0
-1
1
2p(n)
+ 1
>
for 1 6 n < 2k, where p(n) is the ruler function.
44 Prove that a baseball player whose batting average is .316 must have
batted at least 19 times. (If he has m hits in n times at bat, then
m/n
E
[.3155, .3165).)
45 The number 9376 has the peculiar self-reproducing property that
9376*
= 87909376
How many 4-digit numbers x satisfy the equation
x2
mod 10000 = x?
How many n-digit numbers x satisfy the equation
x2
mod
10n
= x?
46
a
Prove that if nj =
l
and nk = 1 (mod m), then nscd(jtk) = 1.
b Show that 2”
f
1 (mod n), if n > 1. Hint: Consider the least prime
factor of n.
47
48
Show that if nmp’ E 1 (mod m) and if n(“-‘)/p $ 1 (mod m) for all
primes such that p\(m
-
l), then m is prime. Hint: Show that if this
condition holds, the numbers nk mod m are distinct, for 1
6
k < m.
Generalize Wilson’s theorem (4.49) by ascertaining the value of the ex-
pression u-I1 <n<m, nlm
n)modm,whenm>l.
Wilson’s theorem:
“Martha, that boy
is a menace.”
Radio announcer:
‘I
. . .
pitcher Mark
LeChiffre
hits a
two-run single!
Mark was batting
only .080, so he gets
his
second
hit of
the year.
Anything wrong?
The proof that large
numbers are prime
is very easy: Let
x
be a
large prime
number; then x is
prime, QED.
4 EXERCISES 149
49 Let R(N) be the number of pairs of integers (m, n) such that 0 6 m < N,
O<n<N,andmIn.
L
Express R(N) in terms of the
@
function.
Prove that R(N) = EdaN LN/dJ’p(d).
50 Let m be a positive integer and let
w
=
e2nilm
= cos(2n/m) +isin(27r/m).
What
are the roots
of disunity?
We say that w is an mth root of unity, since
wm
=
eZni
= 1. In fact,
each of the m complex numbers
w”,
w’,
. . , w”-’ is an mth root of
unity, because
(wk)“’
=
eZnki
= 1; therefore
z
-
wk
is a factor of the
polynomial
zm
-
1, for 0 < k < m. Since these factors are distinct, the
complete factorization of
zm
-
1 over the complex numbers must be
zm
-1 =
n
(Z-Wk).
O<k<m
a Let
Y,(z)
=
n
Oik<m,klm(~
-
wk).
(This polynomial of degree
q(m) is called the cyclotomic polynomial of order
m.)
Prove that
zm
-1 = r-p&(Z).
d\m
b
Prove that
Ym(z)
= nd,m(~d
-
l)k(m/d).
Exam problems
51
Prove Fermat’s theorem (4.48) by expanding
(1
+
1 +
+.
.
+
1)P
via the
multinomial theorem.
52
Let n and x be positive integers such that
x
has no divisors 6 n (except
l),
and let p be a prime number. Prove that at least
Ln/p]
of the numbers
{X-l,X2-1,...,Xn~'
-
1
}
are multiples of
p.
53
Find all positive integers n such that n \ [(n
-
l)!/(n +
l)].
54 Determine the value of
lOOO!
mod
1O25o
by hand calculation.
55 Let
P,
be the product of the first n factorials,
ni=,
k!. Prove that
P2,/PP,
is an integer, for all positive integers n.
56 Show that
2np1
n-1
I-I
pin(k,
Zn-k)
I-n
2k+
1)
ZnpZk-1
k=l k=l
is a power of 2.
150 NUMBER THEORY
57 Let S(m,n) be the set of all integers k such that
mmodk+nmodk 3 k.
For example, S(7,9) =
{2,4,5,8,10,11,12,13,14,15,16}.
Prove that
x
q(k) = m.n
kESlm,n)
Hint: Prove first that
x,6msn
,&,,,
v(d) =
IL>,
v(d)
ln/dJ.
Then
consider
L(m
+ n)/d]
-
[m/d]
-
Ln/dJ.
58 Let f(m) =
Ed,,,,
d.
Fi:nd
a necessary and sufficient condition that f(m)
is a power of 2.
Bonus problems
59 Prove that if x1, . . . ,
x,
are positive integers with 1 /x1
f.
. . + 1
/x,
=
1,
then
max(xl,.
. . ,x,) < e,. Hint: Prove the following stronger result by
induction: “If 1 /x1
+.
. . + 1
/x,
+
l/o1
=
1,
where x1, . . . ,
x,
are positive
integers and
01
is a rational number 3
max(xl
, . . , xn), then a+ 1 <
e,+l
and
x1
.
xn
(a + 1) <
el
. . .
e,e,+l
.”
(The proof is nontrivial.)
60 Prove that there’s a constant P such that (4.18) gives only primes. You
may use the following
(Ihighly
nontrivial) fact: There is a prime between
p and p + cp’, for some constant c and all sufficiently large p, where
g=losl.
1920
61 Prove that if m/n,
m’/n’,
and
m/‘/n”
are consecutive elements of 3~,
then
m” =
[(n+N)/n’]m’-m,
n” =
[(n+N)/n’jn’-n.
(This recurrence allows us to compute the elements of
3N
in order, start-
ing with
f
and ft.)
62 What binary number corresponds to e, in the binary
tf
Stern-Brocot
correspondence? (Express your answer as an infinite sum; you need not
evaluate it in closed form.)
63 Show that if Fermat’s Last Theorem (4.46) is false, the least n for which
it fails is prime. (You may assume that the result holds when n = 4.)
Furthermore, if
aP
+
bP
=
cp
and a
I
b, show that there exists an integer
m such that
a+b
=
mp,
if p$c;
pPV1 mP
,
if p\c.
Thus c must be really huge. Hint: Let x = a + b, and note that
gcd(x,
(ap
+ (x
-
a)p)/x)
= gcd(x,paP-‘).
4 EXERCISES 151
64 The Peirce sequence
3’~
of order N is an infinite string of fractions
separated by ‘<’ or ‘=’
signs, containing all the nonnegative fractions
m/n with m > 0 and n 6 N (including fractions that are not reduced).
It is defined recursively by starting with
For N > 1, we form ?$,+I by inserting two symbols just before the kNth
symbol of ?N, for all k > 0. The two inserted symbols are
k-l
-
ZI
N+l
if kN is odd;
k-l
yN,kN
-
N+l’
if kN is even.
Here ?N,j denotes the jth symbol of
Y’
N,
which will be either ‘<’ or ‘=’
when j is even; it will be a fraction when j is odd. For example,
Ip2
=
~=~<t<f=f<I<4=f<5<4=~~~~~=~~~~~=~~...;
y3
zz
4=~=P<~<t<3<~=~=t<~<~<~~~=~=~~~~~~...~
y4
=
4=~=Q=q<1,1,2=L,2,3,~=~=~=~~~~~~~=,..;
4 3 4 2 3 4 2 4 3
Ip5
=
~=~=P=Q=q<l<l<l<r<l=1,1,2,3,1,2=4=....;
543542534524
Ip6
=
q,~,~,g,Q=~<l,l,l,l,l,Z,1=3=L,3,4=....
65463546256
(Equal elements occur in a slightly peculiar order.) Prove that the ‘<’
and ‘=’ signs defined by the rules above correctly describe the relations
between adjacent fractions in the Peirce sequence.
Research problems
65 Are the Euclid numbers
e,
all squarefree?
66 Are the Mersenne numbers
2P
-
1 all squarefree?
67 Prove or disprove that maxl<j<kbn ok/gCd(oj, ok) 3 n, for all sequences
of integers 0 < al <
...
< a,.
68 Is there a constant Q such that [Q’“] is prime for all n 3 O?
69 Let
P,
denote the nth prime. Prove or disprove that P,+r
-
P,
=
O(logP,)?
70 Does es(n!) = ez(n!)/2 for infinitely many n?
71 Prove or disprove: If k # 1 there exists n > 1 such that 2”
z
k (mod n).
Are there infinitely many such n?
72 Prove or disprove: For all integers a, there exist infinitely many n such
that
cp(n)\(n
+ a).
152 NUMBER THEORY
73
If the 0(n) + 1 terms of the Farey series
were fairly evenly distributed, we would expect 3n(k)
z
k/@(n). There-
fore the sum D(n) =
~~~‘[3~(k)
-
k/O(n)1 measures the “deviation
of
3,,
from uniformity!’ Is it true that D(n) = 0
(n1/2+E)
for all
e
> O?
74 Approximately how many distinct values are there in the set {O! mod p,
l!modp,...,(p-l)!modp},asp+oo?
Binomial Coefficients
Lucky us!
Otherwise known
as combinations of
n things, k at a
time.
LET’S TAKE A BREATHER. The previous chapters have seen some heavy
going, with sums involving floor, ceiling, mod, phi, and mu functions. Now
we’re going to study binomial coefficients, which turn out to be (a) more
important in applications, and (b) easier to manipulate, than all those other
quantities.
5.1
BASIC IDENTITIES
The symbol (t) is a binomial coefficient, so called because of an im-
portant property we look at later this section, the binomial theorem. But we
read the symbol “n choose
k!’
This incantation arises from its combinatorial
interpretation-it is the number of ways to choose a k-element subset from
an n-element set. For example, from the set {1,2,3,4} we can choose two
elements in six ways,
so
(“2)
= 6.
To express the number (c) in more familiar terms it’s easiest to first
determine the number of k-element sequences, rather than subsets, chosen
from an n-element set; for sequences, the order of the elements counts. We
use the same argument we used in Chapter 4 to show that n! is the number
of permutations of n objects. There are n choices for the first element of the
sequence; for each, there are n-l choices for the second; and so on, until there
are
n-k+1
choices for the kth. This gives n(n-1). . .
(n-k+l)
= nk choices
in all. And since each k-element subset has exactly k! different orderings, this
number of sequences counts each subset exactly k! times. To get our answer,
we simply divide by k!:
n
0
=
n(n-l)...(n-k+l)
k
k(k-l)...(l)
153
154 BINOMIAL COEFFICIENTS
For example,
0
4 2
=-=
4.3
2.1
6.
'
this agrees with our previous enumeration.
We call n the upper index and k the lower index, The indices are
restricted to be nonnegative integers by the combinatorial interpretation, be-
cause sets don’t have negative or fractional numbers of elements. But the
binomial coefficient has many uses besides its combinatorial interpretation,
so we will remove some of the restrictions. It’s most useful, it turns out,
to allow an arbitrary real (or even complex) number to appear in the upper
index, and to allow an arbitrary integer in the lower. Our formal definition
therefore takes the following form:
r(r-l)...(r-kkl)
r-k
k(k-l)...(l) = k!’
integer k 3 0;
(5.1)
0,
integer k < 0.
This definition has several noteworthy features. First, the upper index is
called
r,
not n; the letter r emphasizes the fact that binomial coefficients make
sense when any real number appears in this position. For instance, we have
(,')
= (-l)(-2)(-3)/(3.2.1)= -1.
Th
ere’s
no combinatorial interpretation
here, but r = -1 turns out to be an important special case. A noninteger
index like r =
-l/2
also turns out to be useful.
Second, we can view
(;>I
as a kth-degree polynomial in
r.
We’ll see that
this viewpoint is often helpful.
Third, we haven’t defined binomial coefficients for noninteger lower in-
dices. A reasonable definition can be given, but actual applications are rare,
so we will defer this generalization to later in the chapter.
Final note: We’ve listed the restrictions ‘integer k 3 0’ and ‘integer
k < 0’ at the right of the definition. Such restrictions will be listed in all
the identities we will study, so that the range of applicability will be clear.
In general the fewer
restricti.ons
the better, because an unrestricted identity
is most useful; still, any restrictions that apply are an important part of
the identity. When we manipulate binomial coefficients, it’s easier to ignore
difficult-to-remember restrictions temporarily and to check later that nothing
has been violated. But the check needs to be made.
For example, almost every time we encounter
(“,)
it equals 1, so we can
get lulled into thinking that it’s always 1. But a careful look at definition (5.1)
tells us that
(E)
is 1 only when n
1:
0 (assuming that n is an integer); when
n < 0 we have
(“,)
= 0. Traps like this can (and will) make life adventuresome.
5.1 BASIC IDENTITIES 155
Binomial coefficients
were well known
in
Asia, many
cen-
turies before Pascal
was born
1741,
but
he bad no way to
know that.
In
Italy it’s called
Tartaglia’s
triangle.
Before getting to the identities that we will use to tame binomial coeffi-
cients, let’s take a peek at some small values. The numbers in Table 155 form
the beginning of Pascal’s triangle, named after Blaise Pascal (1623-1662)
Table 155 Pascal’s triangle.
n
0
1
2
3
I
4
5
6
7
8
9
10
1
11
12 1
13 3 1
14 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1
10
45 120 210 252 210 120 45 10 1
because he wrote an influential treatise about them
[227].
The empty entries
in this table are actually O’s, because of a zero in the numerator of (5.1); for
example, (l) = (
1.0)/(2.1)
= 0. These entries have been left blank simply to
help emphasize the rest of the table.
It’s worthwhile to memorize formulas for the first three columns,
r
0
0
=I,
(;)=?.,
(;)2g;
these hold for arbitrary reals. (Recall that (“T’) =
in(n
+ 1) is the formula
we derived for triangular numbers in Chapter 1; triangular numbers are con-
spicuously present in the (;) column of Table 155.) It’s also a good idea to
memorize the first five rows or so of Pascal’s triangle, so that when the pat-
tern
1,
4, 6, 4, 1 appears in some problem we will have a clue that binomial
coefficients probably lurk nearby.
The numbers in Pascal’s triangle satisfy, practically speaking, infinitely
many identities, so it’s not too surprising that we can find some surprising
relationships by looking closely. For example, there’s a curious “hexagon
property,” illustrated by the six numbers 56, 28, 36, 120, 210, 126 that sur-
round 84 in the lower right portion of Table 155. Both ways of multiplying
alternate numbers from this hexagon give the same product: 56.36.210 =
28.120.126 = 423360. The same thing holds if we extract such a hexagon
from any other part of Pascal’s triangle.
156 BINOMIAL COEFFICIENTS
And now the identities,. Our goal in this section will be to learn a few
“C’est
une
chose
simple rules by which we can solve the vast majority of practical problems
estrange combien
involving binomial coefficients.
il est fertile en
Definition (5.1) can be recast in terms of factorials in the common case
proprietez.
-B.
Pascal /227/
that the upper index
r
is an integer, n, that’s greater than or equal to the
lower index k:
n
0
n!
k = k!(n-k)!’
integers n 3 k
2:
0.
(5.3)
To get this formula, we just multiply the numerator and denominator of (5.1)
by (n
-
k)!. It’s occasionally useful to expand a binomial coefficient into this
factorial form (for example, when proving the hexagon property). And we
often want to go the other way, changing factorials into binomials.
The factorial representation hints at a symmetry in Pascal’s triangle:
Each row reads the same left-to-right as right-to-left. The identity reflecting
this-called the symmetry identity-is obtained by changing k to n
-
k:
(5.4)
This formula makes combinatorial sense, because by specifying the k chosen
things out of n we’re in effect specifying the n
-
k unchosen things.
The restriction that n and k be integers in identity (5.4) is obvious, since
each lower index must be an integer. But why can’t n be negative? Suppose,
for example, that n = -1. Is
(‘)
(-ilk)
a valid equation? No. For instance, when k = 0 we get 1 on the left and 0 on
the right. In fact, for any integer k 3 0 the left side is
c-1
I(-2).
.
.1:-k)
=
(-,
)k
k!
,
which is either 1 or -1; but the right side is 0, because the lower index is
negative. And for negative k the left side is 0 but the right side is
=
(-I)-’
k,
which is either 1 or -1. So the equation ‘(-,‘) =
((;!,)I
is always false!
The symmetry identity fails for all other negative integers n, too. But
unfortunately it’s all too easy to forget this restriction, since the expression
in the upper index is sometimes negative only for obscure (but legal) values
5.1 BASIC IDENTITIES 157
I just hope
I
don’t
of its variables. Everyone who’s manipulated binomial coefficients much has
fall into this trap
during the
midterm.
fallen into this trap at least three times.
But the symmetry identity does have a big redeeming feature: It works
for all values of k, even when k < 0 or k > n. (Because both sides are zero in
such cases.) Otherwise 0 < k 6 n, and symmetry follows immediately from
(5.3):
n
0
n!
k = k!(n-k)!
Our next important
coefficients:
=
(n-(n--l\!
(n-k)! =
identity lets us move things in and out of binomial
(3
=
I,(:::))
integer k # 0.
(5.5)
The restriction on k prevents us from dividing by 0 here. We call (5.5)
an absorption identity, because we often use it to absorb a variable into a
binomial coefficient when that variable is a nuisance outside. The equation
follows from definition
(5.1),
because
rk
=
r(r-
1
)E
and k! = k(k- l)! when
k
>
0; both sides are zero when k < 0.
If we multiply both sides of (5.5) by k, we get an absorption identity that
works even when k = 0:
k(l[)
=
r(;-i)
, integer k.
This one also has a companion that keeps the lower index intact:
(r-k)(I)
=
r(‘i’),
integer k.
(5.6)
(5.7)
We can derive (5.7) by sandwiching an application of (5.6) between two ap-
plications of symmetry:
(r-k)(;)
=
(r-kl(rlk)
(by
symmetry)
=
r(,.Ti!
,)
(by
(54)
(by symmetry)
But wait a minute. We’ve claimed that the identity holds for all real
r,
yet the derivation we just gave holds only when r is a positive integer. (The
upper index r
-
1 must be a nonnegative integer if we’re to use the symmetry
158 BINOMIAL COEFFICIENTS
property (5.4) with impunity.) Have we been cheating? No. It’s true that
(We/l,
not here
the derivation is valid only for positive integers r; but we can claim that the
anyway)
identity holds for all values of r, because both sides of (5.7) are polynomials
in r of degree k + 1. A nonzero polynomial of degree d or less can have at
most d distinct zeros; therefore the difference of two such polynomials, which
also has degree d or less, cannot be zero at more than d points unless it is
identically zero. In other words, if two polynomials of degree d or less agree
at more than d points,
the,y
must agree everywhere. We have shown that
(r-k)(;) =
&‘)
h
w enever
T
is a positive integer; so these two polynomials
agree at infinitely many points, and they must be identically equal.
The proof technique in the previous paragraph, which we will call the
polynomial argument, is useful for extending many identities from integers
to reals; we’ll see it again and again. Some equations, like the symmetry
identity (5.4), are not identities between polynomials, so we can’t always use
this method. But many identities do have the necessary form.
For example, here’s another polynomial identity, perhaps the most im-
portant binomial identity of all, known as the addition formula:
(3
=
(‘*‘)
+ (;-I:)
s
integer k.
(5.8)
When
r
is a positive integer, the addition formula tells us that every number
in Pascal’s triangle is the sum of two numbers in the previous row, one directly
above it and the other just to the left. And the formula applies also when
r
is negative, real, or complex; the only restriction is that k be an integer, so
that the binomial coefficients are defined.
One way to prove the addition formula is to assume that
r
is a positive
integer and to use the combinatorial interpretation. Recall that (I) is the
number of possible k-element subsets chosen from an r-element set. If we
have a set of
r
eggs that includes exactly one bad egg, there are (i) ways to
select k of the eggs. Exactly
(‘i’)
of these selections involve nothing but good
eggs; and (,“\) of them contain the bad egg, because such selections have k-l
of the
r
-- 1 good eggs. Adding these two numbers together gives (5.8). This
derivation assumes that
r
is a positive integer, and that k 3 0. But both sides
of the identity are zero when k < 0, and the polynomial argument establishes
(5.8) in all remaining cases.
We can also derive (5.8) by adding together the two absorption identities
(5.7) and (5.6):
(r-k)(;)
+k(l)
=
r(‘i’)
+r(;-:);
the left side is r(i), and we can divide through by
r.
This derivation is valid
for everything but r = 0, and it’s easy to check that remaining case.
5.1 BASIC IDENTITIES 159
Those of us who tend not to discover such slick proofs, or who are oth-
erwise into tedium, might prefer to derive (5.8) by a straightforward manip-
ulation of the definition. If k > 0,
(r- l)k
(r-
l)k-’
(‘*‘)+(;I:)
=
k!+
(k- l)!
=
(T-l)lf=l(r-k)
+
(r-l)k-‘k
k!
k!
=
(r-l)Er
=
f
=
r
k! k!
0
k
Again, the cases for k < 0 are easy to handle.
We’ve just seen three rather different proofs of the addition formula. This
is not surprising; binomial coefficients have many useful properties, several of
which are bound to lead to proofs of an identity at hand.
The addition formula is essentially a recurrence for the numbers of Pas-
cal’s triangle, so we’ll see that it is especially useful for proving other identities
by induction. We can also get a new identity immediately by unfolding the
recurrence. For example,
(Z)
=
(;)
+
(Z)
=
(D+(i)+(f)
=
(;)+(;)+(;)+(i)
=
(I)++++,
Since
(!,)
= 0, that term disappears and we can stop. This method yields
the general formula
,5-,(‘:“)
=
(a)
+
(‘7’)
+...+
(“n”)
=
(r’:“))
integer n.
(5.9)
Notice that we don’t need the lower limit k 3 0 on the index of summation,
because the terms with k < 0 are zero.
This formula expresses one binomial coefficient as the sum of others whose
upper and lower indices stay the same distance apart. We found it by repeat-
edly expanding the binomial coefficient with the smallest lower index: first
5.1 BASIC IDENTITIES 161
Let’s look at this derivation blow by blow. The key step is in the second line,
where we apply the symmetry law (5.4) to replace (“,‘“) by (“‘,‘“). We’re
allowed to do this only when m + k 3 0, so our first step restricts the range
of k by discarding the terms with k < -m. (This is legal because those terms
are zero.) Now we’re almost ready to apply (5.10); the third line sets this up,
replacing k by k
-
m and tidying up the range of summation. This step, like
the first, merely plays around with t-notation. Now k appears by itself in
the upper index and the limits of summation are in the proper form, so the
fourth line applies (5.10). One more use of symmetry finishes the job.
Certain sums that we did in Chapters 1 and 2 were actually special cases
of
(5.10),
or disguised versions of this identity. For example, the case m = 1
gives the sum of the nonnegative integers up through n:
(3
+
(;)
+...f
(y) =
O+l
+...+n
=
(n:l)n
=
(“:‘).
And the general case is equivalent to Chapter 2’s rule
kn =
(n+l)m+’
integers
m,n
3 0,
Obk<n
m+l
if we divide both sides of this formula by m!. In fact, the addition formula
(5.8) tells us that
A((:))
=
(z’)-(iii)
=
(my’
if we replace
r
and k respectively by x + 1 and m. Hence the methods of
Chapter 2 give us the handy indefinite summation formula
L(z)”
=
(m;,)+”
162 BINOMIAL COEFFICIENTS
Binomial coefficients get their name from the binomial theorem, which
deals with powers of the binomial expression x + y. Let’s look at the smallest
cases of this theorem:
(x+y)O =
lxOyO
(x+y)' =
Ix'yO
+
lxc'y'
(x+y)Z =
lxZy0-t2x'y'
+lxOy2
(X+y)3 =
lx3yO
fSx2y' +3x'y2+1xOy3
(x+Y)~ =
1x4yo
+4x3y' +6x2y2 +4x'y3 +1x"y4.
It’s not hard to see why these coefficients are the same as the numbers in
Pascal’s triangle: When we expand the product
tX+t)n =
ix+Y)(x+Y)...b+d,
every term is itself the product of n factors, each either an x or
y.
The number
of such terms with k factors of x and n
-
k factors of y is the coefficient
of xkyndk after we combine like terms. And this is exactly the number of
ways to choose k of the n binomials from which an x will be contributed; that
is, it’s (E).
Some textbooks leave the quantity
O”
undefined, because the functions
x0 and
0”
have different limiting values when x decreases to 0. But this is a
mistake. We must define
x0 = 1, for all x,
if the binomial theorem is to be valid when x = 0, y = 0, and/or x =
-y.
The theorem is too important to be arbitrarily restricted! By contrast, the
function
OX
is quite unimportant.
But what exactly is the binomial theorem? In its full glory it is the
following identity:
(x + y)’ =
1
;
xky’--k,
0
integer
T
3 0
k
or
lx/y1
<
1.
(5.12)
“At the age
of twenty-one
he [Moriarty] wrote
a treatise upon the
Binomial Theorem,
which has had a Eu-
ropean vogue. On
the strength of it,
he won the Math-
ematical Chair at
one of our smaller
Universities.”
-5’.
Holmes
1711
The sum is over all integers k; but it is really a finite sum when
r
is a nonneg-
ative integer, because all terms are zero except those with 0 6 k 6
T.
On the
other hand, the theorem is also valid when
r
is negative, or even when
r
is
an arbitrary real or complex number. In such cases the sum really is infinite,
and we must have
ix/y1
< 1 to guarantee the sum’s absolute convergence.
5.1 BASIC IDENTITIES 163
Two special cases of the binomial theorem are worth special attention,
even though they are extremely simple. If x = y = 1 and
r
= n is nonnegative,
we get
2n =
(J+(y)+.-+(;),
integer n 3 0.
This equation tells us that row n of Pascal’s triangle sums to 2”. And when
x is -1 instead of
fl,
we get
0" =
(I)-(Y)+...+(-l)Q
integer n 3 0.
For example, 1
-
4 + 6
-
4 + 1 = 0; the elements of row n sum to zero if we
give them alternating signs, except in the top row (when n = 0 and
O”
= 1).
When
T
is not a nonnegative integer, we most often use the binomial
theorem in the special case y = 1. Let’s state this special case explicitly,
writing z instead of x to emphasize the fact that an arbitrary complex number
can be involved here:
(1
+z)'
=
x
(;)z*,
IZI
< 1.
k
(5.13)
The general formula in (5.12) follows from this one if we set z = x/y and
multiply both sides by y’.
We have proved the binomial theorem only when
r
is a nonnegative in-
teger, by using a combinatorial interpretation. We can’t deduce the general
case from the nonnegative-integer case by using the polynomial argument,
because the sum is infinite in the general case. But when
T
is arbitrary, we
can use Taylor series and the theory of complex variables:
f"(0)
+
FZ2
+...
The derivatives of the function f(z) = (1 + z)’ are easily evaluated; in fact,
fckl(z)
=
rk
(1 +
z)~~~.
Setting
2
= 0 gives (5.13).
(Chapter 9 tells the We also need to prove that the infinite sum converges, when
IzI
<
1.
It
meaning of 0 .)
does, because (I) =
O(k-‘-‘)
by equation (5.83) below.
Now let’s look more closely at the values of (L) when n is a negative
integer. One way to approach these values is to use the addition law (5.8) to
fill in the entries that lie above the numbers in Table 155, thereby obtaining
Table 164. For example, we must have
(i’)
=
1,
since (t) =
(i’)
+ (11) and
(1:) = 0; then we must have
(;‘)
= -1, since
(‘$
= (y’) +
(i’);
and so on.
164 BINOMIAL COEFFICIENTS
Table 164 Pascal’s triangle, extended upward.
n (a)
(7)
(3
(I)
(3
(t)
(a)
(:)
(i)
(‘d)
(;o)
-4 1 -4
10
-20
35
-56
84
-120
165
-220
286
-3 1 -3
6
-10
15
-21 28
-36
45
-55
66
-2 1 -2
3 -4. 5
-6
7
-8
9
-10
11
-1
1 -1
1
-1
1
-1
1
-1
1
-1
1
0
10
0 0 0 0 0 0 0 0 0
All these numbers are familiar. Indeed, the rows and columns of Ta-
ble 164 appear as columns in Table 155 (but minus the minus signs). So
there must be a connection between the values of
(L)
for negative n and the
values for positive n. The general rule is
(3
=
(-l)k(kp;-
‘)
, integer k;
it is easily proved, since
rk =
r(r-l)...(r-kkl)
=
(-l)k(-r)(l
-r)...(k-1
-r) = (-l)k(k-r-l)k
(5.14)
when k 3 0, and both sides are zero when k < 0.
Identity (5.14) is particularly valuable because it holds without any re-
striction. (Of course, the lower index must be an integer so that the binomial
coefficients are defined.) The transformation in (5.14) is called negating the
upper index, or “upper negation!’
But how can we remember this important formula? The other identities
we’ve seen-symmetry, absorption, addition, etc. -are pretty simple, but
this one looks rather messy. Still, there’s a mnemonic that’s not too bad: To
negate the upper index, we begin by writing down
(-l)k,
where k is the lower
index. (The lower index doesn’t change.) Then we immediately write k again,
twice, in both lower and upper index positions. Then we negate the original
upper index by subtracting it from the new upper index. And we complete
the job by subtracting 1 more (always subtracting, not adding, because this
is a negation process).
Let’s negate the upper index twice in succession, for practice. We get
(;) =
(-v(k-;-1)
You call this a
mnemonic? I’d call
it
pneumatic-
full
of air.
It does help me
remember, though.
(Now is a good
time to do warmup
exercise 4.)
=
(-1)2k
k-(k-r-l)-1
k
5.1 BASIC IDENTITIES 165
so we’re right back where we started. This is probably not what the framers of
R’s
also frustrating, the identity intended; but it’s reassuring to know that we haven’t gone astray.
if we’re
trying to
get somewhere else.
Some applications of (5.14) are, of course, more useful than this. We can
use upper negation, for example, to move quantities between upper and lower
index positions. The identity has a symmetric formulation,
(-I)-(-:
‘)
=
(-l)n(-mG
‘)
,
integers m,n 3 0,
(5.15)
which holds because both sides are equal to (“,‘“) .
Upper negation can also be used to derive the following interesting sum:
(5.16)
The idea is to negate the upper index, then apply (5.g), and negate again:
(Here double
nega-
tion helps, because
we’ve sandwiched
another operation in
between.)
t
(;)(-uk
=
t
(“-L-l)
kcm k$m
-r+m
=
(
>
m
zz
(-l)m
+ml
.
(
>
This formula gives us a partial sum of the rth row of Pascal’s triangle, provided
that the entries of the row have been given alternating signs. For instance, if
r=5andm=2theformulagives1-5+10=6=(-1)2(~).
Notice that if m 3 r, (5.16) gives the alternating sum of the entire row,
and this sum is zero when r is a positive integer. We proved this before, when
we expanded (1
-
1)’ by the binomial theorem; it’s interesting to know that
the partial sums of this expression can also be evaluated in closed form.
How about the simpler partial sum,
L(L)
=
(I)
+
(3
+..*+
(ii);
.
(5.17)
surely if we can evaluate the corresponding sum with alternating signs, we
ought to be able to do this one? But no; there is no closed form for the partial
sum of a row of Pascal’s triangle. We can do columns-that’s
(5.1o)-but
166 BINOMIAL COEFFICIENTS
not rows. Curiously, however, there is a way to partially sum the row elements
if they have been
multiplied1
by their distance from the center:
&,
(I)
(I
-k) =
Eq(m:
,),
integer m.
\
(5.18)
(This formula is easily verified by induction on m.) The relation between
these partial sums with and without the factor of (r/2
-
k) in the summand
is analogous to the relation between the integrals
s
a
xe+
dx
=
+“.2
and
-m
s
c-i
e
-XLdx.
--oo
The apparently more compl.icated integral on the left, with the factor of x,
has a closed form, while the isimpler-looking integral on the right, without the
factor, has none. Appearances can be deceiving.
(Well,
it actually
At the end of this chapter, we’ll study a method by which it’s possible
equals
ifierf
ap
to determine whether or not there is a closed form for the partial sums of a
a multiple of the
L‘err0r
f,,nction,,
given series involving binomial coefficients, in a fairly general setting. This
of
K,
ifwe’re
will-
method is capable of discovering identities (5.16) and (5.18), and it also will
ing to accept that
tell us that (5.17) is a dead end.
as a closed form.)
Partial sums of the binomial series lead to a curious relationship of an-
other kind:
x
(mk+l)xkym-k
=
x
(J(-~)~(x+y)~~*,
integer m. (5.19)
k<m k<m
This identity isn’t hard to prove by induction: Both sides are zero when
m < 0 and 1 when m = 0. If we let
S,
stand for the sum on the left, we can
apply the addition formula (5.8) and show easily that
‘m
=
&(m~~+r)Xkym~k+&(m~~~r)x~ym-k;
.
and
EC
m-l
+r
k
>
XkY
m-k
k<m
=
YSm-I
+
(m-i+r)Xm,
=
xsm-,
)
when m > 0. Hence
Sm
=
(X
+y)SmpI
+
-z
(-X)”
,
(
>
5.1 BASIC IDENTITIES 167
and this recurrence is satisfied also by the right-hand side of (5.19). By
induction, both sides must be equal; QED.
But there’s a neater proof. When r is an integer in the range 0 3 r 3
-m,
the binomial theorem tells us that both sides of (5.19) are (x+y)“‘+‘y~‘. And
since both sides are polynomials in r of degree m or less, agreement at m + 1
different values is enough (but just barely!) to prove equality in general.
It may seem foolish to have an identity where one sum equals another.
Neither side is in closed form. But sometimes one side turns out to be easier
to evaluate than the other.
For example, if we set x = -1 and y =
1,
we get
y(y)(-l,x
=
k<m
integer m 3 0,
an alternative form of identity (5.16). And if we set x = y = 1 and r = m +
1,
we get
&
(‘“,’
‘)
=
&
(“:
“pk.
. .
The left-hand side sums just half of the binomial coefficients with upper index
2m +
1,
and these are equal to their counterparts in the other half because
Pascal’s triangle has left-right symmetry.
Hence the left-hand side is just
1pm+1
=
22”
2
. This yields a formula that is quite unexpected,
(5.20)
Let’s check it when m = 2:
(‘,)
+
i(f)
+
i(i)
= 1 +
$
+
$
= 4. Astounding.
So far we’ve been looking either at binomial coefficients by themselves or
at sums of terms in which there’s only one binomial coefficient per term. But
many of the challenging problems we face involve products of two or more
binomial coefficients, so we’ll spend the rest of this section considering how
to deal with such cases.
Here’s a handy rule that often helps to simplify the product of two bino-
mial coefficients:
(L)(F)
=
(I)(z$
integers m, k.
(5.21)
We’ve already seen the special case k = 1; it’s the absorption identity (5.6).
Although both sides of (5.21) are products of binomial coefficients, one side
often is easier to sum because of interactions with the rest of a formula. For
example, the left side uses m twice, the right side uses it only once. Therefore
we usually want to replace
(i)
(r) by
(I;)
(A<“,)
when summing on
m.
168 BINOMIAL COEFFICIENTS
Equation (5.21) holds primarily because of cancellation between m!‘s in
the factorial representations of (A) and (T) . If all variables are integers and
r 3 m 3 k 3 0, we have
r m
>(
>
T!
m!
=--
m k m!(r-m)! k!(m-k)!
r.
I
=-
k! (m- k)! (r-m)!
r!
=
--
(?.--I!
=
(;)(;;“k>.
k!(r-k)! (m-k)!(r-m)!
That was easy. Furthermore, if m < k or k < 0, both sides of (5.21) are
Yeah, right.
zero; so the identity holds for all integers m and k. Finally, the polynomial
argument extends its validity to all real r.
A binomial coefficient
1:;)
=
r!/(r
-
k)! k! can be written in the form
(a + b)!/a! b! after a suitab1.e renaming of variables. Similarly, the quantity
in the middle of the derivation above, r!/k! (m
-
k)! (r
-
m)!, can be written
in the form (a + b + ~)!/a! b! c!. This is a “trinomial coefficient
:’
which arises
in the “trinomial theorem” :
(x+y+z)n
=
t
(a+b+c)!
a! b! c!
xay
bZC
O$a,b,c<n
a+b+c=n
a+b+c
b+c
b+c
)(
>
C
xaybzc .
a+b+c=n
“Excogitavi autem
olim mirabilem
regulam pro
nu-
meris
coefficientibus
potestatum, non
So (A) (T) is really a trinomial coefficient in disguise. Trinomial coefficients
tanturn
a
bhomio
pop up occasionally in applications, and we can conveniently write them as
x + y , sed et a
trinomio
x + y +
2,
imo a polynomio
quocunque,
ut
data
potentia gradus
cujuscunque v.
(a + b + c)!
(aaTE,Tc)
=
a!b!
in order to emphasize the symmetry present.
gr.
decimi,
et
Binomial and trinomial coefficients generalize to multinomial coefi-
potentia in ejus
valore
comprehensa,
bents,
which are always expressible as products of binomial coefficients:
ut
x5y3z2, possim
statim assignare
al + a2 + . . . + a,
>
_=
(al
+az+...+a,)!
numerum coef-
ficientem, quem
al,a2,...,a,
al ! ar! . . . a,!
habere debet, sine
a1
+ a2
+
. . . + a,
a2 + . . . + a,
>
“’
(““h,‘am)
.
ulla
Tabula
jam
==
calculata
--G.,V~~ibni~[~()fJ/
Therefore, when we run across such a beastie, our standard techniques apply.
5.1 BASIC IDENTITIES 169
Table 169 Sums of oroducts of binomial coefficients.
;
(m:3(*:k)
=
(SJ)
integers m, n. (5.22)
$
(,:,)
(n;k)
=
(,‘;;,)
1 integer
“”
(5.23)
integers m, n.
;
(m;k)
(“zk)(-lik
=
(-,)l+f;-;)
,
integer
13”
(5.24)
integers m, n.
5
(‘m”)
(k”n)(-l)k
=
(-l)L+m(;I;I;)
1
l,zy;o.
(5.25)
Now we come to Table 169, which lists identities that are among the most
important of our standard techniques. These are the ones we rely on when
struggling with a sum involving a product of two binomial coefficients. Each
of these identities is a sum over k, with one appearance of k in each binomial
coefficient; there also are four nearly independent parameters, called m, n,
T,
etc., one in each index position. Different cases arise depending on whether k
appears in the upper or lower index, and on whether it appears with a plus or
minus sign. Sometimes there’s an additional factor of (-1
)k,
which is needed
to make the terms summable in closed form.
Fold
down the Table 169 is far too complicated to memorize in full; it is intended only
corner on this page,
so you can find the
for reference. But the first identity in this table is by far the most memorable,
table quickly later.
and it should be remembered. It states that the sum (over all integers k) of the
You’ll
need it!
product of two binomial coefficients, in which the upper indices are constant
and the lower indices have a constant sum for all k, is the binomial coefficient
obtained by summing both lower and upper indices. This identity is known
as Vandermonde’s convolution, because Alexandre Vandermonde wrote a
significant paper about it in the late 1700s
[293];
it was, however, known
to Chu Shih-Chieh in China as early as 1303. All of the other identities in
Table 169 can be obtained from Vandermonde’s convolution by doing things
like negating upper indices or applying the symmetry law, etc., with care;
therefore Vandermonde’s convolution is the most basic of all.
We can prove Vandermonde’s convolution by giving it a nice combinato-
rial interpretation. If we replace k by k
-
m and n by n
-
m, we can assume
170 BINOMIAL COEFFICIENTS
that m = 0; hence the identity to be proved is
&
(L)(nik)
=
(r:s)~
integer n.
(5.27)
Let
T
and s be nonnegative integers; the general case then follows by the
polynomial argument.
On the right side,
(‘L”)
is the number of ways to
choose n people from among r men and s women. On the left, each term
Sexist! You
men-
of the sum is the number of ways to choose k of the men and n
-
k of the
Coned men first.
women. Summing over all k. counts each possibility exactly once.
Much more often than n.ot we use these identities left to right, since that’s
the direction of simplification. But every once in a while it pays to go the
other direction, temporarily making an expression more complicated. When
this works, we’ve usually created a double sum for which we can interchange
the order of summation and then simplify.
Before moving on let’s look at proofs for two more of the identities in
Table 169. It’s easy to prove (5.23); all we need to do is replace the first
binomial coefficient by
(,-k-,),
then Vandermonde’s (5.22) applies.
The next one,
(5.24),
is a bit more difficult. We can reduce it to
Van-
dermonde’s convolution by a sequence of transformations, but we can just
as easily prove it by resorting to the old reliable technique of mathematical
induction. Induction is often the first thing to try when nothing else obvious
jumps out at us, and induction on
1
works just fine here.
For the basis
1
= 0, all terms are zero except when k =
-m;
so both sides
of the equation are (-l)m(s;m).
N
ow suppose that the identity holds for all
values less than some fixed
1,
where
1
> 0. We can use the addition formula
to
replace
(,\,)
by
(,,!,yk)
i- (,i-,‘_,)
;
th
e
original sum now breaks into two
sums, each of which can be evaluated by the induction hypothesis:
q
(A,;)
(“‘I”)‘--‘)“+&
(m;;‘l)
(s;k)(-l)*
And this simplifies to the right-hand side of
(5.24),
if we apply the addition
formula once again.
Two things about this derivation are worthy of note. First, we see again
the great convenience of summing over all integers k, not just over a certain
range, because there’s no need to fuss over boundary conditions. Second,
the addition formula works nicely with mathematical induction, because it’s
a recurrence for binomial coefficients. A binomial coefficient whose upper
index is
1
is expressed in terms of two whose upper indices are
1
-
1,
and
that’s exactly what we need to apply the induction hypothesis.
5.1 BASIC IDENTITIES 171
So much for Table 169. What about sums with three or more binomial
coefficients? If the index of summation is spread over all the coefficients, our
chances of finding a closed form aren’t great: Only a few closed forms are
known for sums of this kind, hence the sum we need might not match the
given specs. One of these rarities, proved in exercise 43, is
r
s
=(
)O
m n’
integers
m,n
3 0.
Here’s another, more symmetric example:
=
(a+b+c)!
a’b’c’
integers a, b, c 3 0.
. . .
(5.28)
(5.29)
This one has a two-coefficient counterpart,
~(~~~)(~:~)(-l)k
=
w,
integersa,b>O,
(5.30)
which incidentally doesn’t appear in Table 169. The analogous four-coefficient
sum doesn’t have a closed form, but a similar sum does:
= (a+b+c+d)! (a+b+c)! (a+b+d)! (a+c+d)! (b+c+d)!
(2a+2b+2c+2d)!
(a+c)! (b+d)! a! b! c! d!
integers a, b, c, d 3 0.
This was discovered by John Dougall
[69]
early in the twentieth century.
Is Dougall’s identity the hairiest sum of binomial coefficients known? No!
The champion so far is
=(
al
+...+a,
1
al,az,...,a,
'
integers
al,
al,.
. . , a, > 0.
(5.31)
Here the sum is over
(“r’)
index variables kii for 1 <
i
< j < n. Equation
(5.29) is the special case n = 3; the case n = 4 can be written out as follows,
172 BINOMIAL COEFFICIENTS
ifweuse (a,b,c,d) for
(al,az,as,Q)
and (i,j,k) for
(k12,k13,k23):
=
(a+b+c+d)!
a!b!c!d!
-’
integers a, b, c, d
3
0.
The left side of (5.31) is the coefficient of 2:~;. . .zt after the product of
n(n
-
1) fractions
has been fully expanded into positive and negative powers of the
2’s.
The
right side of (5.31) was conjectured by Freeman Dyson in 1962 and proved by
several people shortly thereafter. Exercise 86 gives a “simple” proof of (5.31).
Another noteworthy identity involving lots of binomial coefficients is
;~-l)~+k(j;k)(;)(;)(m+;~~-k)
=
("n">
(:I;)
)
integers m, n > 0.
(5.32)
This one, proved in exercise 83, even has a chance of arising in practical
applications. But we’re getting far afield from our theme of “basic identities,’
so we had better stop and take stock of what we’ve learned.
We’ve seen that binomial coefficients satisfy an almost bewildering va-
riety of identities. Some of these, fortunately, are easily remembered, and
we can use the memorable ones to derive most of the others in a few steps.
Table 174 collects ten of the most useful formulas, all in one place; these are
the best identities to know.
5.2
BASIC PRA.CTICE
In the previous section we derived a bunch of identities by manipu-
lating sums and plugging in other identities. It wasn’t too tough to find those
derivations- we knew what we were trying to prove, so we could formulate
a general plan and fill in the details without much trouble. Usually, however,
out in the real world, we’re not faced with an identity to prove; we’re faced
with a sum to simplify. An.d we don’t know what a simplified form might
look like (or even if one exists). By tackling many such sums in this section
and the next, we will hone
clur
binomial coefficient tools.
Algorithm
self-teach:
1 read problem
2 attempt solution
3
skim book solu-
tion
4 ifattempt failed
&Ol
else
Rot0
next
problem
Unfortunately
that
algorithm
can put you in an
infinite loop.
Suggested patches:
0
&cc0
3a set c
t
c + 1
3b ifc = N
go& your TA
63
0
-E.
W.
Dijkstra
.
But this
sub-
chapter is called
BASIC practice.
5.2 BASIC PRACTICE 173
To start, let’s try our hand at a few sums involving a single binomial
coefficient.
Problem 1: A sum of ratios.
We’d like to have a closed form for
g
(3/G)
)
integers n 3 m 3 0.
At first glance this sum evokes panic, because we haven’t seen any identi-
ties that deal with a quotient of binomial coefficients. (Furthermore the sum
involves two binomial coefficients, which seems to contradict the sentence
preceding this problem.) However, just as we can use the factorial represen-
tations to reexpress a product of binomial coefficients as another product
-
that’s how we got identity
(5.21)--e
can do likewise with a quotient. In
fact we can avoid the grubby factorial representations by letting r = n and
dividing both sides of equation (5.21) by
(i)
(t);
this yields
(T)/(L)
=
(Z)/(E).
So we replace the quotient on the left, which appears in our sum, by the one
on the right; the sum becomes
We still have a quotient, but the binomial coefficient in the denominator
doesn’t involve the index of summation k, so we can remove it from the sum.
We’ll restore it later.
We can also simplify the boundary conditions by summing over all k 3 0;
the terms for k > m are zero. The sum that’s left isn’t so intimidating:
&
(2)
*
/
It’s similar to the one in identity (5.g), because the index k appears twice
with the same sign. But here it’s -k and in (5.9) it’s not. The next step
should therefore be obvious; there’s only one reasonable thing to do:
&
(2)
=
174 BINOMIAL COEFFICIENTS
Table 174 The ton ten binomial coefficient identities.
n
0
n!
=--
integers
k k!(n--k)!
nak>O.
factorial expansion
(E)
=
(n.l.k)
integer n 3 0,
integer k.
symmetry
integer k # 0.
absorption/extraction
(;)
=
(Ii’)
+
(;I:),
integer k.
addition/induction
(;)
=
(-l)k(kVL-‘),
integer k.
upper negation
integers m, k.
trinomial revision
integer
r
3 0,
or
Ix/y1
< 1.
binomial theorem
integer n.
parallel summation
integers
m,n>O.
upper summation
integer n. Vandermonde convolution
And now we can apply the parallel summation identity, (5.9):
n-mfk
‘(n-m) +m+ 1
k m
\
) =
(n;‘).
Finally’ we reinstate the
(k)
in the denominator that we removed from
the sum earlier, and then apply (5.7) to get the desired closed form:
(“;‘)/(:)
=
$A&*
This derivation actually works for any real value of n, as long as no division
by zero occurs; that is, as long as n isn’t one of the integers 0, 1, . . . , m
-
1.
5.2 BASIC PRACTICE 175
The more complicated the derivation, the more important it is to check
the answer. This one wasn’t too complicated but we’ll check anyway. In the
small case m = 2 and n = 4 we have
(g/(40)
+
(f)/(Y)
+
($yJ
=
l
+i+i
=
:;
yes, this agrees perfectly with our closed form (4 +
1)/(4
+ 1
-
2).
Problem 2: From the literature of sorting.
Our next sum appeared way back in ancient times (the early 1970s)
before people were fluent with binomial coefficients. A paper that introduced
an improved merging technique
[165]
concludes with the following remarks:
“It can be shown that the expected number of saved transfers . . is given by
the expression
Here m and n are as defined above, and mCn is the symbol for the number
of combinations of m objects taken n at a time. . . . The author is grateful to
the referee for reducing a more complex equation for expected transfers saved
to the form given here.”
Please, don’t
re-
mind me of the
midterm.
We’ll see that this is definitely not a final answer to the author’s problem.
It’s not even a midterm answer.
First we should translate the sum into something we can work with; the
ghastly notation
m-rPICm-n-l
is enough to stop anybody, save the enthusi-
astic referee (please). In our language we’d write
T
=
gk(zI:I:>/(:))
integers m > n 3 0.
The binomial coefficient in the denominator doesn’t involve the index of sum-
mation, so we can remove it and work with the new sum
What next? The index of summation appears in the upper index of the
binomial coefficient but not in the lower index. So if the other k weren’t there,
we could massage the sum and apply summation on the upper index (5.10).
With the extra k, though, we can’t. If we could somehow absorb that k into
the binomial coefficient, using one of our absorption identities, we could then
176 BINOMIAL COEFFICIENTS
sum on the upper index. Unfortunately those identities don’t work here. But
if the k were instead m
-
k, we could use absorption identity (5.6):
i--k)(~I~~~)
=
(m-n)(mmlE).
So here’s the key: We’ll rewrite k as m
-
(m
-
k) and split the sum S
into two sums:
m-k-l
m-n-l
)
=
f(m-(m-kl)(~~~~:>
k=O
m-k-l
=
m-n-l
)
-f(m-ki(~I~~~)
k=O
=
mg
(,“I:::)
-f(m-nJ(;g
k=O
=
mA-
(m-n)B,
where
The sums A and B that remain are none other than our old friends in
which the upper index varies while the lower index stays fixed. Let’s do B
first, because it looks simpler. A little bit of massaging is enough to make the
summand match the left side of (5.10):
In the last step we’ve included the terms with 0 6 k < m
-
n in the sum;
they’re all zero, because the upper index is less than the lower. Now we sum
on the upper index, using
(5.10),
and get
Do old exams
ever die?
5.2 BASIC PRACTICE 177
The other sum A is the same, but with m replaced by m
-
1. Hence we
have a closed form for the given sum S, which can be further simplified:
S
=
mA-(m-n)B
=
m(mmn)
-(m-n)(mrnn:,)
=
(m-Y+,)
(mmn)’
And this gives us a closed form for the original sum:
=
m-n+1
m-n
n
(
m
m
n
=
m-n+1
Even the referee can’t simplify this.
Again we use a small case to check the answer. When m = 4 and n = 2,
we have
T
=
ow(;)
+
lW@
+
24/(4,)
=
o+
g
+;
=
5)
which agrees with our formula 2/(4
-
2 + 1).
Problem 3:
From
an old exam.
Let’s do one more sum that involves a single binomial coefficient. This
one, unlike the last, originated in the halls of academia; it was a problem on
a take-home test. We want the value of
Q~~OOOOO,
when
Qn =
x
(‘“k
‘)(-l)‘,
integer n 3 0.
k<2”
This one’s harder than the others; we can’t apply any of the identities we’ve
seen so far. And we’re faced with a sum of
2’oooooo
terms, so we can’t just
add them up. The index of summation k appears in both indices, upper and
lower, but with opposite signs. Negating the upper index doesn’t help, either;
it removes the factor of (-1
)k,
but it introduces a 2k in the upper index.
When nothing obvious works, we know that it’s best to look at small
cases. If we can’t spot a pattern and prove it by induction, at least we’ll have
178 BINOMIAL COEFFICIENTS
some data for checking our results. Here are the nonzero terms and their sums
for the first four values of
rt.
n
Qll
0
(2
=1 =1
(3
-
(3
=1-l
=o
2
(i)
-
(;)
+
(i)
= 1
-
3
+
1
= -1
3
@-((:)+($)-(;;)+(;)=l-7+15-lO+l=
0
We’d better not try the next case, n = 4; the chances of making an arithmetic
error are too high. (Computing terms like
(‘4’)
and (‘:) by hand, let alone
combining them with the others, is worthwhile only if we’re desperate.)
So the pattern starts out 1, 0, -1, 0. Even if we knew the next term or
two, the closed form wouldn’t be obvious. But if we could find and prove a
recurrence for
Q,,
we’d probably be able to guess and prove its closed form.
To find a recurrence, we need to relate
Qn
to
Q,--1
(or to
Qsmaiier
vaiues);
but
to do this we need to relate a term like
(12:J13),
which arises when n = 7 and
k = 13, to terms like
(“,;“).
This doesn’t look promising; we don’t know
any neat relations between entries in Pascal’s triangle that are 64 rows apart.
The addition formula, our main tool for induction proofs, only relates entries
that are one row apart.
But this leads us to a key observation: There’s no need to deal with
entries that are 2”-’ rows apart. The variable n never appears by itself, it’s
always in the context
2”.
So the
2n
is a red herring! If we replace 2” by m,
Oh, the sneakiness
all we need to do is find a closed form for the more general (but easier) sum
of the instructor
who set that exam.
integer m 3 0;
then we’ll also have a closed form for
Q,,
=
Rz~.
And there’s a good chance
that the addition formula will give us a recurrence for the sequence R,.
Values of R, for small m can be read from Table 155, if we alternately
add and subtract values that appear in a southwest-to-northeast diagonal.
The results are:
There seems to be a lot of cancellation going on.
Let’s look now at the formula for R, and see if it defines a recurrence.
Our strategy is to apply the addition formula (5.8) and to find sums that
5.2 BASIC PRACTICE 179
have the form
Rk
in the resulting expression, somewhat as we did in the
perturbation method of Chapter 2:
m-l-k
k
m-l-k
)(-l)k
+
x
(m-;-k)(-)k+’
=
R,p,
+
(-1)‘”
-
R,p2
-
(-l)2(mp’i
=
R,e,
-
Rmp2.
Anyway those of
us who’ve done
warmup exercise 4
know it.
(In the next-to-last step we’ve used the formula (-,‘) =
(-l)“,
which we know
is true when m 3 0.) This derivation is valid for m 3 2.
From this recurrence we can generate values of
R,
quickly, and we soon
perceive that the sequence is periodic. Indeed,
R,
=
1
1
0
-1
if m mod 6 =
-1
0
1
0
1
2
3
4
5
The proof by induction is by inspection. Or, if we must give a more academic
proof, we can unfold the recurrence one step to obtain
R,
= (R,p2
-
Rmp3)
-
R,-2
=
-Rm-3
,
whenever m 3 3. Hence
R,
=
Rmp6
whenever m 3 6.
Finally, since Q,, =
Rzn,
we can determine Q,, by determining 2” mod 6
and using the closed form for
R,.
When n = 0 we have
2O
mod 6 = 1; after
that we keep multiplying by 2 (mod 6), so the pattern 2, 4 repeats. Thus
{
R1
=l,
ifn=O;
Q,, = Rp =
R2
= 0, if n is odd;
R4=-I,
ifn>Oiseven.
This closed form for Qn agrees with the first four values we calculated when
we started on the problem. We conclude that
Q,OOOO~~
=
R4
= -1.
180 BINOMIAL COEFFICIENTS
Problem 4: A sum involving two binomial coefficients.
Our next task is to
find:
a closed form for
integers m > n 3 0.
Wait a minute. Where’s the second binomial coefficient promised in the title
of this problem? And why should we try to simplify a sum we’ve already
simplified? (This is the sum S from Problem 2.)
Well, this is a sum that’s easier to simplify if we view the summand
as a product of two binomial coefficients, and then use one of the general
identities found in Table 169. The second binomial coefficient materializes
when we rewrite k as
(y):
And identity (5.26) is the one to apply, since its index of summation appears
in both upper indices and with opposite signs.
But our sum isn’t quite in the correct form yet. The upper limit of
summation should be m
-
1:)
if we’re to have a perfect match with (5.26). No
problem; the terms for n
<:
k 6 m
-
1 are zero. So we can plug in, with
(I,
m,n, q)
+-
(m
-
1,
m-n.
-
1,
1,O);
the answer is
This is cleaner than the formula we got before. We can convert it to the
previous formula by using (5.7):
(m<+l)
= n ( m
)’m-n+1
m-n
Similarly, we can get interesting results by plugging special values into
the other general identities we’ve seen. Suppose, for example, that we set
m = n = 1 and q = 0 in (5.26). Then the identity reads
x
(l-k)k
=
(‘:‘).
O<k$l
Theleftsideis1((1+1)1/2)-(12+2’+..
.
+
L2),
so this gives us a brand new
way to solve the sum-of-squares problem that we beat to death in Chapter 2.
The moral of this story is: Special cases of very general sums are some-
times best handled in the general form. When learning general forms, it’s
wise to learn their simple specializations.
5.2 BASIC PRACTICE 181
Problem 5: A sum with three factors.
Here’s another sum that isn’t too bad. We wish to simplify
&
(3
(ls)k,
integer n 3 0.
The index of summation k appears in both lower indices and with the same
sign; therefore identity (5.23) in Table 169 looks close to what we need. With
a bit of manipulation, we should be able to use it.
The biggest difference between (5.23) and what we have is the extra k in
our sum. But we can absorb k into one of the binomial coefficients by using
one of the absorption identities:
;
(;)
($
=
&
(;)
(2)s
=
SF
(;)(;I:)
*
We don’t care that the s appears when the k disappears, because it’s constant.
And now we’re ready to apply the identity and get the closed form,
If we had chosen in the first step to absorb k into (L), not
(i),
we wouldn’t
have been allowed to apply (5.23) directly, because n
-
1 might be negative;
the identity requires a nonnegative value in at least one of the upper indices.
Problem 6: A sum with menacing characteristics.
The next sum is more challenging. We seek a closed form for
&(n:k’)rp)g,
integern30.
So we should
deep six this sum,
right?
One useful measure of a sum’s difficulty is the number of times the index of
summation appears. By this measure we’re in deep trouble-k appears six
times. Furthermore, the key step that worked in the previous problem-to
absorb something outside the binomial coefficients into one of them-won’t
work here. If we absorb the k + 1 we just get another occurrence of k in its
place. And not only that: Our index k is twice shackled with the coefficient 2
inside a binomial coefficient. Multiplicative constants are usually harder to
remove than additive constants.
182 BINOMIAL COEFFICIENTS
We’re lucky this time, though. The 2k’s are right where we need them
for identity (5.21) to apply, so we get
&
(“kk)
(T)k$
=
5
(TIk)
($3
/
The two 2’s disappear, and so does one occurrence of k. So that’s one down
and five to go.
The k+ 1 in the denominator is the most troublesome characteristic left,
and now we can absorb it into
(i)
using identity (5.6):
(Recall that n 3 0.) Two down, four to go.
To eliminate another k we have two promising options. We could use
symmetry on
(“lk);
or we could negate the upper index n + k, thereby elim-
inating that k as well as the factor
(-l)k.
Let’s explore both possibilities,
starting with the symmetry option:
&;
(“:“)(;;:)(-‘Jk
=
&q
(“n’“)(;++:)(-‘)*
Third down, three to go, and we’re in position to make a big gain by plugging
For a minute
into (5.24): Replacing (1, m, n, s) by (n + 1 ,
1,
n, n), we get
f
thought we’d
have to punt.
Zero, eh? After all that work? Let’s check it when n = 2:
(‘,)
(i)
$
-
(i) (f)
i
+
(j)(i)+
= 1
-
$
+
f
= 0.
It checks.
Just for the heck of it, let’s explore our other option, negating the upper
index of
(“lk):
Now (5.23) applies, with
(l,m,n,s)
t
(n +
l,l,O,
-n
-
l),
and
hi;
(-nlF1)(z:)
=
s(t).
77~
binary search:
Replay the middle
formula first, to see
if the mistake was
early or late.
5.2 BASIC PRACTICE 183
Hey wait. This is zero when n > 0, but it’s 1 when n = 0. Our other
path to the solution told us that the sum was zero in all cases! What gives?
The sum actually does turn out to be 1 when n = 0, so the correct answer is
‘[n=O]‘. We must have made a mistake in the previous derivation.
Let’s do an instant replay on that derivation when n = 0, in order to see
where the discrepancy first arises. Ah yes; we fell into the old trap mentioned
earlier: We tried to apply symmetry when the upper index could be negative!
We were not justified in replacing (“lk) by (“zk) when k ranges over all
integers, because this converts zero into a nonzero value when k <
-n.
(Sorry
about that.)
The other factor in the sum, (L,‘:), turns out to be zero when k <
-n,
except when n = 0 and k = -1. Hence our error didn’t show up when we
checked the case n = 2. Exercise 6 explains what we should have done.
Problem 7: A new obstacle.
This one’s even tougher; we want a closed form for
integers
m,n
> 0.
If m were 0 we’d have the sum from the problem we just finished. But it’s
not, and we’re left with a real mess-nothing we used in Problem 6 works
here. (Especially not the crucial first step.)
However, if we could somehow get rid of the m, we could use the result
just derived. So our strategy is: Replace
(:Itk)
by a sum of terms like
(‘lt)
for some nonnegative integer
1;
the summand will then look like the summand
in Problem 6, and we can interchange the order of summation.
What should we substitute for
(cztk)?
A painstaking examination of the
identities derived earlier in this chapter turns up only one suitable candidate,
namely equation (5.26) in Table 169. And one way to use it is to replace the
parameters
(L,
m, n,
q,
k) by (n + k
-
1,2k, m
-
1
,O,
j), respectively:
x
(n+k2;l
-j)
(myl)
(2;)s
k>O O$j<n+k-1
=
&(mil)
,-z+,
(n+ki’-i)(T)%
‘k?O
In the last step we’ve changed the order of summation, manipulating the
conditions below the
1’s
according to the rules of Chapter 2.
184 BINOMIAL COEFFICIENTS
We can’t quite replace the inner sum using the result of Problem 6,
because it has the extra condition k > j
-
n + 1. But this extra condition
is superfluous unless j
-
n + 1 > 0; that is, unless j > n. And when j 3 n,
the first binomial coefficient of the inner sum is zero, because its upper index
is between 0 and k
-
1, thus strictly less than the lower index 2k. We may
therefore place the additional restriction j < n on the outer sum, without
affecting which nonzero terms are included. This makes the restriction k 3
j
-
n + 1 superfluous, and we can use the result of Problem 6. The double
sum now comes tumbling down:
I&)
x
~+k;l-i)~;)%
,
k>j-n+l
k>O
=
t
(,:,)In-1-j=O]
=
(:I:).
06j<n
The inner sums vanish except when j = n
-
1,
so we get a simple closed form
as our answer.
Problem 8: A different obstacle.
Let’s branch out from Problem 6 in another way by considering the sum
sm
=
&(n;k)(21;)k:;1:m’
integers
m,n
3 0.
/
Again, when m = 0 we have the sum we did before; but now the m occurs
in a different place. This problem is a bit harder yet than Problem 7, but
(fortunately) we’re getting better at finding solutions. We can begin as in
Problem 6,
Now (as in Problem 7) we try to expand the part that depends on m into
terms that we know how to deal with. When m was zero, we absorbed k + 1
into (z); if m > 0, we can do the same thing if we expand 1
/(k
+ 1 + m) into
absorbable terms. And our luck still holds: We proved a suitable identity
-1
r+l
integer m 3 0,
=
r+l-m’
7-g
{O,l,...,
m-l}.
(5.33)
5.2 BASIC PRACTICE 185
in Problem 1. Replacing
T
by -k
-
2 gives the desired expansion,
5%
=
&
(“:“)
(1)&y&
(7)
(-k;2)~1.
,
Now the (k +
l)-’
can be absorbed into (z), as planned. In fact, it could
also be absorbed into (-kj- 2)p1. Double absorption suggests that even more
cancellation might be possible behind the scenes. Yes-expanding everything
in our new summand into factorials and going back to binomial coefficients
gives a formula that we can sum on k:
They expect
us
to
check this
on a
sheet of
sm =
(mE-t)!
j>.
~t-l)j(mn++;,+l)
c
(;;l++;;;)
(-n;
')
scratch paper.
m! n!
= (m+n+l)!
xc-
I.(
,I
m+n+l
j
n+l+j n
j20
JO
The sum over all integers j is zero, by (5.24). Hence
-S,
is the sum for j < 0.
To evaluate
-S,
for j < 0, let’s replace j by -k
-
1 and sum for k 3 0:
m! n!
sm
= (m+n+l)!
k>O
~(-l)frn,+“k’l)
(-k;l)
I
I
. .
= (m+mnn+l)!
k<n
;lp,y-k(m+;+
‘>
(“n”-
‘>
m! n!
= (m+n+l)!
;:-,)*(m+;+l)
r;‘)
k<n
m! n!
=
(m+n+l)!
k<2n
x
,,,k(,,,+yy.
Finally (5.25) applies, and we have our answer:
sin
=
(-‘)n(my;;l)!
;
0
=
(-l)nm’l-mZ!d.,
Whew; we’d better check it. When n = 2 we find
1
s,=--
6 6
-+-
=
m(m- 1)
m+l
mS2
m+3 (m+l)(m+2)(m+3)
Our derivation requires m to be an integer, but the result holds for all real m,
because (m + 1
)n+'
S,
is a polynomial in m of degree 6 n.
186 BINOMIAL COEFFICIENTS
5.3
TRICKS OF THE TRADE
Let’s look next at three techniques that significantly amplify the
methods we have already learned.
nick 1: Going halves.
This
should really
Many of our identities involve an arbitrary real number r. When
r
has
be
ca11ed
Trick
l/2
the special form “integer minus one half,” the binomial coefficient
(3
can be
written as a quite different-looking product of binomial coefficients. This leads
to a new family of identities that can be manipulated with surprising ease.
One way to see how this works is to begin with the duplication formula
rk (r
-
5)” =
(2r)Zk/22k
)
integer k 3 0.
(5.34)
This identity is obvious if we expand the falling powers and interleave the
factors on the left side:
r(r--i)(r-l)(r-i)...(r-k+f)(r-k+i)
= (2r)(2r
-
1). . . (2r
-
2k+
1)
2.2...:2
Now we can divide both sides by k!‘, and we get
(I;)
(y2)
=
(3
(g/2”,
integer k.
(5.35)
If we set k =
r
= n, where n is an integer, this yields
integer n.
And negating the upper index gives yet another useful formula,
(-y2)
=
($)”
(:)
,
integer n.
For example, when n = 4 we have
=
(-l/2)(-3/2)(-5/2)(-7/2)
4!
=(
-1 2
)
4
1.2.3.4
1.3.5.7
-~
-1
=(
>
4
1.3.5.7.2.4.6.8
-
4
1.2.3.4.1.2.3.4
=
(;y(;).
(5.36)
(5.37)
.
.
we halve. .
Notice how we’ve changed a product of odd numbers into a factorial.
5.3 TRICKS OF THE TRADE 187
Identity
(5.35)
has
an
amusing
corollary.
Let
r =
in,
and take the
sum
over
all
integers
k.
The
result is
c
(;k)
(2.32*
=
;
(y)
((y2)
n-1/2
=
(
>
17421
integer n 3
0
(5.33)
by
(5.23),
because
either
n/2
or
(n
-
1)/2 is
Ln/2],
a nonnegative
integer!
We
can
also
use
Vandermonde’s
convolution
(5.27)
to
deduce
that
6
(-y’)
(R1/Zk) =
(:)
=
(-l)n,
integer n 3
0.
Plugging in the values
from
(5.37)
gives
this is
what
sums
to
(-l)n.
Hence
we
have
a
remarkable property
of
the
“middle”
elements
of
Pascal’s
triangle:
&211)(2zIF)
= 4n,
integern>O.
(5.39)
For
example,
(z)
($
+($
(“,)+(“,)
(f)+($
(i) = 1.20+2.6+6.2+20.1 =
64
=
43.
These
illustrations
of
our
first
trick
indicate that
it’s
wise
to
try
changing
binomial
coefficients
of
the
form
(p) into binomial
coefficients
of
the
form
(nm;‘2),
where
n is
some
appropriate integer (usually
0,
1,
or
k);
the resulting
formula might
be
much
simpler.
Trick 2: High-order differences.
We
saw
earlier that
it’s
possible to evaluate partial
sums
of
the
series
(E)
(-1
)k,
but
not
of
the
series
(c).
It
turns
out
that
there
are
many important
applications
of
binomial
coefficients
with alternating
signs,
(t) (-1
)k.
One
of
the
reasons
for
this is that
such
coefficients
are
intimately
associated
with the
difference
operator
A
defined
in
Section
2.6.
The
difference
Af
of
a
function
f
at the point
x
is
Af(x)
=
f(x
+
1)
-
f(x)
;
188 BINOMIAL COEFFICIENTS
if we apply A again, we get the second difference
A2f(x) = Af(x + 1)
-
Af(x) = (f(x+Z)
-
f(x+l))
-
(f(x+l) -f(x))
= f(x+2)-2f(x+l)+f(x),
which is analogous to the second derivative. Similarly, we have
A3f(x) =
f(x+3)-3f(x+2)+3f(x+l)-f(x);
A4f(x)
=
f(x+4)-4f(x+3)+6f(x+2)-4f(x+l)+f(x);
and so on. Binomial coefficients enter these formulas with alternating signs.
In general, the nth difference is
A”f(x) =
x
(-l)"-kf(x+
k),
integer n 3
0.
k
This formula is easily proved by induction, but there’s also a nice way to prove
it directly using the elementary theory of operators, Recall that Section 2.6
defines the shift operator E by the rule
Ef(x) = f(x+l);
hence the operator A is E
-
1,
where 1 is the identity operator defined by the
rule
1 f(x)
=
f(x). By
the binomial theorem,
A” = (E-l)” =
t
(;)Ek(-l)"~k.
k
This is an equation whose elements are operators; it is equivalent to
(5.40)~
since Ek is the operator that takes
f(x)
into
f(x
+
k).
An interesting and important case arises when we consider negative
falling powers. Let f(x) = (x
-
1 )-’ = l/x. Then, by rule
(2.45),
we have
Af(x) = (-1)(x-
l)A,
A2f(x) = (-1)(-2)(x-
l)s,
and in general
A”((x-1)=1) =
(-1)%(x-l)*
= [-l)nx(X+l)n!.(x+n)
. .
Equation (5.40) now tells us that
n!
-
=
x(x+l)...(x+n)
-,
x+n
(
)
-1
=x
n
x @{0,-l,...,
-n}.
(5.41)
5.3 TRICKS OF THE TRADE 189
For example,
1 4
6
4
1
---
x+1
f---
x+2
x+3
+-
X x+4
4!
=
x(x+1)(x+2)(x+3)(x+4)
=
l/x(xfi4).
The sum in (5.41) is the partial fraction expansion of n!/(x(x+l) . . . (x+n)).
Significant results can be obtained from positive falling powers too. If
f(x) is a polynomial of degree d, the difference Af(x) is a polynomial of degree
d-l
;
therefore
A*
f(x) is a constant, and
An
f (x) = 0 if n > d. This extremely
important fact simplifies many formulas.
A closer look gives further information: Let
f(x)
=
adxd+ad~~xd-'+"'+a~x'+a~xo
be any polynomial of degree d. We will see in Chapter 6 that we can express
ordinary powers as sums of falling powers (for example, x2 =
x2
+ xl); hence
there are coefficients
bd,
bdP1,
. . . ,
bl,
bo
such that
f(X) =
bdX~+bd~,Xd-l+...+b,x~+box%
(It turns out that
bd
=
od
and
bo
=
ao,
but the intervening coefficients are
related in a more complicated way.) Let
ck
= k!
bk
for 0 6 k 6 d. Then
f(x)
= Cd(;)
+Cd-l(dy,)
+...+C,
(;>
.,(;)
;
thus, any polynomial can be represented as a sum of multiples of binomial
coefficients. Such an expansion is called the Newton series of f(x), because
Isaac Newton used it extensively.
We observed earlier in this chapter that the addition formula implies
‘((;))
=
(kr
I)
Therefore, by induction, the nth difference of a Newton series is very simple:
A”f(X) =
cd
(dxn)
‘cd&l(&~n)
““+‘l
(lTn)
+cO(Tn).
If we now set x = 0, all terms ck(kxn) on the right side are zero, except the
term with k-n = 0; hence
190 BINOMIAL COEFFICIENTS
The Newton series for f(x) is therefore
f(x)
= Adf(0)
;
+Ad-‘f(0)
0
+-.+.f,O,(;)
+f(O)(;)
For example, suppose f(x) = x3. It’s easy to calculate
f(0) = 0, f(1) = 1, f(2) = 8, f(3) = 27;
Af(0) = 1, Af(1) = 7,
Af(2) = 19;
A’f(0) = 6, A’f(1) = 12;
A3f(0) = 6.
So the Newton series is x3 =
6(:)
+6(l)
+ 1 (;) + O(i).
Our formula A” f(0) =
c,
can also be stated in the following way, using
(5.40) with x = 0:
g;)(-uk(Co(~)+cl(;)+c2(~)+...)
= (-1)X,
integer n 3 0.
Here
(c~,cI,c~,...)
is an arbitrary sequence of coefficients; the infinite sum
co(~)+c,(:)+c2(:)+...
is actually finite for all k 3 0, so convergence is not
an issue. In particular, we can prove the important identity
w
k
L
(-l)k(ao+alk+...+a,kn)
=
(-l)%!a,,
integer n > 0,
(5.42)
because the polynomial
a0
-t
al
k + . . . +
a,kn
can always be written as a
Newton series
CO(~)
+
cl
(F)
-t . . . +
c,(E)
with
c,
= n! a,.
Many sums that appear to be hopeless at first glance can actually be
summed almost trivially by using the idea of nth differences. For example,
let’s consider the identity
c
(3
(‘n”“)
(-l)k
=
sn
,
integer n > 0.
(5.43)
This looks very impressive, because it’s quite different from anything we’ve
seen so far. But it really is easy to understand, once we notice the telltale
factor
(c)(-l)k
in the summand, because the function
5.3 TRICKS OF THE TRADE 191
is a polynomial in k of degree n, with leading coefficient (-1 )“s”/n!. There-
fore (5.43) is nothing more than an application of (5.42).
We have discussed Newton series under the assumption that f(x) is a
polynomial. But we’ve also seen that infinite Newton series
f(x)
=
co(;)
+cl
(7)
+c2(;)
+.
make sense too, because such sums are always finite when x is a nonnegative
integer. Our derivation of the formula A”f(0) =
c,,
works in the infinite case,
just as in the polynomial case; so we have the general identity
f(x)
=
f(O)(;)
+Af,O,(;)
.,f(O,(;)
+Ali(O,(;)
+...
,
integer x 3 0.
(5.44)
This formula is valid for any function f(x) that is defined for nonnegative
integers x. Moreover, if the right-hand side converges for other values of x,
it defines a function that “interpolates” f(x) in a natural way. (There are
infinitely many ways to interpolate function values, so we cannot assert that
(5.44) is true for all x that make the infinite series converge. For example,
if we let f(x) = sin(rrx), we have f(x) = 0 at all integer points, so the right-
hand side of (5.44) is identically zero; but the left-hand side is nonzero at all
noninteger x.)
A Newton series is finite calculus’s answer to infinite calculus’s Taylor
series. Just as a Taylor series can be written
9(a) s’(a) s”(a) 9”‘(a)
g(a+x) =
7X0
+
7X'
+
7x2+1x3
+...
,
(Since E = 1 + A, the Newton series for f(x) = g( a + x) can be written
E”
=
&(;)A”;
and
EXg(a)
=
da
+
xl
4
s(a)
b(a)
A2 s(a)
g(a+x)
=
Tx”+Txl+T
x2 +
A3 s(a)
---x~+...
.
3!
(5.45)
(This is the same as (5.44), because A”f(0) = A”g(a) for all n 3 0 when
f(x) = g( a + x).) Both the Taylor and Newton series are finite when g is a
polynomial, or when x = 0; in addition, the Newton series is finite when x is a
positive integer. Otherwise the sums may or may not converge for particular
values of x. If the Newton series converges when x is not a nonnegative integer,
it might actually converge to a value that’s different from g (a + x), because
the Newton series (5.45) depends only on the spaced-out function values g(a),
g(a +
l),
g(a + 2), . . . .
192 BINOMIAL COEFFICIENTS
One example of a convergent Newton series is provided by the binomial
theorem. Let g(x) = (1 + z)‘, where z is a fixed complex number such that
Iz/
< 1. Then Ag(x) = (1 + z) ‘+’
-
(1 + 2)’ =
~(1
+ z)‘, hence A”g(x) =
z”( 1 + 2)‘. In this case the infinite Newton series
g(a+X) =
tA”g(a)
n
(3
=
(1
+Z,“t
(;)zn
n
converges to the “correct” value (1 + z)“+‘, for all x.
James Stirling tried to use Newton series to generalize the factorial func-
tion to noninteger values. First he found coefficients S, such that
x!
=
p(;)
=
so(;)
+s,(:>
+s2(;)
+...
(5.46)
is an identity for x = 0, x = 1, x = 2, etc. But he discovered that the resulting
“Forasmuch
as
series doesn’t converge except when x is a nonnegative integer. So he tried
these terms increase
again, this time writing
very fast, their
differences will
lnx! =
&h(z)
= SO(~)
+si(y)
+.2(i)
+,
Now A(lnx!) = ln(x + l)!
-
lnx! = ln(x + l), hence
make a diverging
(5.47)
progression, which
hinders the
ordinate
of the parabola
from approaching to
the truth; therefore
in this and the like
S
An(ln41x=0
n=
= A”-’ (ln(x + 1)) lxx0
cases,
I
interpolate
the logarithms of
the terms, whose
differences consti-
(-1 )n-‘Pk ln(k + 1)
tute a series swiftly
converging.
-J.
Stirling
12811
by (5.40). The coefficients are therefore
SO
=
s1
= 0;
sz
= ln2;
s3
= ln3
-
2 ln2 = In f;
s4
=
ln4-3
ln3-t3 ln2 = In
$$;
etc. In this way Stirling obtained
(Proofs of
conver-
a series that does converge (although he didn’t prove it); in fact, his series
gence
were not
converges for all x > -1. He was thereby able to evaluate
i!
satisfactorily.
invented until the
Exercise 88 tells the rest of the story.
nineteenth century.)
Trick 3: Inversion.
A special case of the rule (5.45) we’ve just derived for Newton’s series
can be rewritten in the following way:
d-4
=
x
(3
k
(-llkfi’k)
H
f(n) =
t
(;)
(-l)kg(k). (5.48)
k
Znvert
this:
‘zmb
ppo’.
5.3 TRICKS OF THE TRADE 193
This dual relationship between f and g is called an inversion formula; it’s
rather like the Mobius inversion formulas (4.56) and (4.61) that we encoun-
tered in Chapter 4. Inversion formulas tell us how to solve “implicit recur-
rences,” where an unknown sequence is embedded in a sum.
For example, g(n) might be a known function, and f(n) might be
un-
known;andwemighthavefoundawaytoprovethatg(n)
=tk(t)(-l)kf(k).
Then (5.48) lets us express f(n) as a sum of known values.
We can prove (5.48) directly by using the basic methods at the beginning
of this chapter. If g(n) =
tk
(T)(-l)kf(k)
for all n 3 0, then
x
(3
(-1
)kg(k)
=
F
(3
t-1
lk
t
(r)
C-1
)‘f(i)
k
i
=
tfiii;
(11)1-ilk+‘(F)
i
=
xfij)&
G)(-llk+‘(~?)
i
=
~f(i,(~)
F(-l)*(nij)
i
[n-j=01
= f(n).
The proof in the other direction is, of course, the same, because the relation
between f and g is symmetric.
Let’s illustrate (5.48) by applying it to the “football victory problem”:
A group of n fans of the winning football team throw their hats high into the
air. The hats come back randomly, one hat to each of the n fans. How many
ways h(n, k) are there for exactly k fans to get their own hats back?
For example, if n = 4 and if the hats and fans are named A, B, C, D,
the
4!
= 24 possible ways for hats to land generate the following numbers of
rightful owners:
ABCD 4 BACD 2 CABD
1
DABC
0
ABDC 2 BADC
0
CADB
0
DACB
1
ACBD 2 BCAD
1
CBAD 2 DBAC
1
ACDB
1
BCDA
0
CBDA
1
DBCA 2
ADBC
1
BDAC
0
CDAB
0
DCAB
0
ADCB 2 BDCA
1
CDBA
0
DCBA
0
Therefore h(4,4) = 1; h(4,3) = 0; h(4,2) = 6;
h(4,l)
= 8; h(4,O) = 9.
194 BINOMIAL COEFFICIENTS
We can determine h(n, k) by noticing that it is the number of ways to
choose k lucky hat owners, namely (L), times the number of ways to arrange
the remaining n-k hats so that none of them goes to the right owner, namely
h(n
-
k, 0). A permutation is called a derangement if it moves every item,
and the number of derangements of n objects is sometimes denoted by the
symbol ‘ni’, read “n subfactorial!’ Therefore h(n
-
k, 0) = (n
-
k)i, and we
have the general formula
h(n,k) =
(Subfactorial notation isn’t standard, and it’s not clearly a great idea; but
let’s try it awhile to see if we grow to like it. We can always resort to ‘D,’ or
something, if ‘ni’ doesn’t work out.)
Our problem would be solved if we had a closed form for ni, so let’s see
what we can find. There’s an easy way to get a recurrence, because the sum
of h(n, k) for all k is the total number of permutations of n hats:
n! =
xh(n,k)
=
t
($(n-k)i
k k
integer n 3 0.
(We’ve changed k to n
-
k and
(,“,)
to
(L)
in the last step.) With this
implicit recurrence we can compute all the h(n, k)‘s we like:
h(n,
0)
h(n,
1)
h(n,2)
h(n,3)
h(n,4)
h(n,5)
h(n,
6)
0
1
1
0
1
2 3 0
1
9 8 6
0
1
24645
2l
20
10 0
1
135 40
15 0
1
For example, here’s how the row for n = 4 can be computed: The two right-
most entries are obvious-there’s just one way for all hats to land correctly,
and there’s no way for just three fans to get their own. (Whose hat would the
fourth fan get?) When k = 2 and k = 1, we can use our equation for h(n, k),
giving h(4,2) = ($h(2,0) = 6.1 = 6, and
h(4,l)
= (;)h(3,0) = 4.2 = 8. We
can’t use this equation for h(4,O); rather, we can, but it gives us h(4,O) =
(;)h(4,0),
h’
h . tw
rc
is rue but useless. Taking another tack, we can use the
The art of math-
relation h(4,O) + 8 + 6 + 0 + 1 =
4!
to deduce that h(4,O) = 9; this is the value
ematics, as of life,
is knowing which
of 4i. Similarly ni depends on the values of
ki
for k < n.
truths are useless.
5.3 TRICKS OF THE TRADE 195
Baseball fans: .367
is also Ty
Cobb’s
lifetime batting
average, the a//-time
record.
Can this be
a coincidence?
(Hey wait, you’re
fudging.
Cobb
‘s
average was
4191/11429
z
.366699,
while
l/e
z
.367879.
But maybe if
Wade Boggs has
a few really good
seasons. . .
)
How can we solve a recurrence like (5.4g)? Easy; it has the form of (5.48),
with g(n) = n! and f(k) =
(-l)kki.
Hence its solution is
ni
=
(-l)“t
k
Well, this isn’t really a solution; it’s a sum that should be put into closed form
if possible. But it’s better than a recurrence. The sum can be simplified, since
k! cancels with a hidden k! in
(i),
so let’s try that: We get
?li
=
x
n!il]“+k
=
n!
x
(-‘lk
.
Oik<n
(n
-
k)!
,
,
O<k<n k!
(5.50)
The remaining sum converges rapidly to the number
tkaO(-l
)k/k! =
e-l.
In fact, the terms that are excluded from the sum are
-
=
&!$?t(-,jk(;;n+:)i),
k20
(-l)n+’
, _ 1
=---
n+l
(-
n+2
+
(n+2)l(n+3)
-“’
and the parenthesized quantity lies between 1 and 1
-
&
=
$.
Therefore
the difference between ni and n!/e is roughly l/n in absolute value; more
precisely, it lies between 1
/(n
+ 1) and 1
/(n
+ 2). But
ni
is an integer.
Therefore it must be what we get when we round n!/e to the nearest integer,
if n > 0. So we have the closed form we seek:
Tli
=
L
J
G+t
+
[n=O].
(5;51)
This is the number of ways that no fan gets the right hat back. When
n is large, it’s more meaningful to know the probability that this happens.
If we assume that each of the n! arrangements is equally likely- because the
hats were thrown extremely high- this probability is
ni
n!/e + O(1)
1
;
=
n!
N
;
=
.367..
.
So when n gets large the probability that all hats are misplaced is almost 37%.
Incidentally, recurrence (5.49) for subfactorials is exactly the same as
(5.46),
the firs recurrence considered by Stirling when he was trying to gen-t
eralize the factorial function. Hence
Sk
= ki. These coefficients are so large,
it’s no wonder the infinite series (5.46) diverges for noninteger x.
Before leaving this problem, let’s look briefly at two interesting patterns
that leap out at us in the table of small h(n, k). First, it seems that the num-
bers 1, 3, 6, 10, 15, . . . below the all-0 diagonal are the triangular numbers.
196 BINOMIAL COEFFICIENTS
This observation is easy to prove, since those table entries are the
h(n,n-2)‘s
and we have
h(n,n-2) =
(3
=
(3,
It also seems that the numbers in the first two columns differ by fl. Is
this always true? Yes,
h(n,O)-h(n,l)
=
ni-n(n-l)i
n(n-l)!
t
e)
O<k$n-1
k!
=
n!(-‘)”
=
(-l)n
n!
In other words, ni = n(n
-
l)l + (-1)“.
This is a much simpler recurrence
for the’ derangement numbers than we had before.
Now let’s invert something else. If we apply inversion to the formula
But inversion is the
source of smog.
that we derived in
(5.41),
we find
x
=
&(;):-li"(yp'.
x+n
/
This is interesting, but not really new. If we negate the upper index in (“lk),
we have merely discovered identity (5.33) again.
5.4 GENERATING FUNCTIONS
We come now to the most important idea in this whole book, the
notion of a generating function. An infinite sequence (Q,
al,
a~,
. . . ) that
we wish to deal with in some way can conveniently be represented as a power
series in an auxiliary variable
z,
A(z) =
ac+a,z+a2z2+...
=
to@“.
k>O
(5.52)
It’s appropriate to use the letter z as the name of the auxiliary variable, be-
cause we’ll often be thinking of z as a complex number. The theory of complex
variables conventionally uses
‘z’
in its formulas; power series (a.k.a. analytic
functions or holomorphic functions) are central to that theory.
5.4 GENERATING FUNCTIONS 197
We will be seeing lots of generating functions in subsequent chapters.
Indeed, Chapter 7 is entirely devoted to them. Our present goal is simply to
introduce the basic concepts, and to demonstrate the relevance of generating
functions to the study of binomial coefficients.
A generating function is useful because it’s a single quantity that repre-
sents an entire infinite sequence. We can often solve problems by first setting
up one or more generating functions, then by fooling around with those func-
tions until we know a lot about them, and finally by looking again at the
coefficients. With a little bit of luck, we’ll know enough about the function
to understand what we need to know about its coefficients.
If A(z) is any power series
&c
akzk,
we will find it convenient to write
[z”]A(z) = a,,;
(5.53)
in other words,
[z”]
A(z) denotes the coefficient of
Z”
in A(z).
Let A(z) be the generating function for
(00,
al,
az,. .
.)
as in
(5.52),
and
let B(z) be the generating function for another sequence (bo, bl , bz , . . ,
).
Then
the product A(z) B (z) is the power series
(ao+alz+azz2+...)(bs+blz+b2z2+..~)
=
aobo
+
(aobl
+ albo)z +
(aobz
+ albl +
a2bo)z2
+ ...
;
the coefficient of
2”
in this product is
sob,, +
al
b,-1
+
. . .
+
anbO
=
$lkb,pl,.
k=O
Therefore if we wish to evaluate any sum that has the general form
Cn
=
f
akbn-k,
k=O
(5.54)
and if we know the generating functions A(z) and B(z) , we have
C
n
=
VI
A(z)B(z)
The sequence (c,) defined by (5.54) is called the
conwo2ution
of the se-
quences (a,) and (b,); two sequences are “convolved” by forming the sums of
all products whose subscripts add up to a given amount. The gist of the previ-
ous paragraph is that convolution of sequences corresponds to multiplication
of their generating functions.
198 BINOMIAL COEFFICIENTS
Generating functions give us powerful ways to discover and/or prove
identities. For example, the binomial theorem tells us that (1 +
z)~
is the
generating function for the sequence
((i)
, (;) , (;) , . . ):
(1
+z)'
=
x
(;)2
k30
Similarly,
(1
+z)”
=
x
(;)zk.
k>O
If we multiply these
togethe:r,
we get another generating function:
(1
+z)T(l
+z)S
= (1
+z)'+s.
And now comes the punch line: Equating coefficients of z” on both sides of
this equation gives us
g:)(A)
=
(T).
We’ve discovered Vandermonde’s convolution, (5.27)!
[5.27)!
=
That was nice and easy; let’s try another. This time we use (1
-z)~,
which
(5.27)[4.27)
is the generating function for the sequence
((-1
)"(G))
=
((h)
,
-(;),
(i)
, . . .
).
(3.27)[2.27)
Multiplying by (1 +
z)~
gives another generating function whose coefficients
(1.27)(0.27)!.
we know:
(1 --
z)'(l
+ z)' = (1
-
z2)'.
Equating coefficients of z” now gives the equation
~(~)(n~k)t-lik
=
(-1)n12(~,)Inevenl.
(5.55)
We should check this on a small case or two. When n = 3, for example,
the result is
(a)(;)-(F)(;)+(I)(T)-(;)(6)
=
O.
Each positive term is cancelled by a corresponding negative term. And the
same thing happens whenever n is odd, in which case the sum isn’t very
5.4 GENERATING FUNCTIONS 199
interesting. But when n is even, say n = 2, we get a nontrivial sum that’s
different from Vandermonde’s convolution:
(ii)(;)-(;)(;)+(;)(;)
=2(i)-r’=
-?.
So (5.55) checks out fine when n = 2. It turns out that (5.30) is a special case
of our new identity (5.55).
Binomial coefficients also show up in some other generating functions,
most notably the following important identities in which the lower index
stays fixed and the upper index varies:
1
lfyou have a high-
lighter pen, these
(1 -Z)n+'
=
t(nn+k)zk,
integern30
k>O
two
equations
have
got to be marked.
Zk
,
integer n 3 0.
(5.56)
(5.57)
The second identity here is just the first one multiplied by
zn,
that is, “shifted
right” by n places. The first identity is just a special case of the binomial
theorem in slight disguise: If we expand (1
-
z)-~-’
by (5.13), the coefficient
of zk is
(-“,-‘)(-l)“,
which can be rewritten as (kl”) or (n:k) by negating
the upper index. These special cases are worth noting explicitly, because they
arise so frequently in applications.
When n = 0 we get a special case of a special case, the geometric series:
1
-
zz
1-z
1
+z+z2
+z3
+ . . . =
X2".
k>O
This is the generating function for the sequence (1 , 1 ,
1,
. . . ), and it is espe-
cially useful because the convolution of any other sequence with this one is
the sequence of sums: When
bk
= 1 for all k, (5.54) reduces to
cn =
g
ak.
k=O
Therefore if A(z) is the generating function for the summands
(ao,
al , a2, . ),
then
A(z)/(l
-2)
is the generating function for the sums
(CO,CI
,cz,.
.
.).
The problem of derangements, which we solved by inversion in connection
with hats and football fans, can be resolved with generating functions in an
interesting way. The basic recurrence
n! =
x
0
L
(n-k)i
k
200 BINOMIAL COEFFICIENTS
can be put into the form of a convolution if we expand
(L)
in factorials and
divide both sides by n!:
n
1
(n-k)i
1=x-p.
k=O
k! (n-k)!
The generating function for the sequence (A,
A,
A,
. . . ) is
e’;
hence if we let
D(z) =
t
3zk,
k>O k!
the convolution/recurrence tells us that
1
~
= e’D(z).
1-z
Solving for D(z) gives
D(z) =
&eP
=
&
.
Equating coefficients of
2”
now tells us that
this is the formula we derived earlier by inversion.
So far our explorations with generating functions have given us slick
proofs of things that we already knew how to derive by more cumbersome
methods. But we haven’t used generating functions to obtain any new re-
sults, except for (5.55). Now we’re ready for something new and more sur-
prising. There are two families of power series that generate an especially rich
class of binomial coefficient identities: Let us define the generalized binomial
series
IBt
(z) and the generalized exponential series Et(z) as follows:
T&(z)
=
t(tk)*-‘;;
E,(z) =
t(tk+
l)k-’
$.
(5.58)
k>O
k>O
It can be shown that these functions satisfy the identities
B,(z)‘-
-T&(z)-’
=
2;;
&t(z)-tln&t(z) = z.
(5.59)
In the special case t = 0, we have
730(z)
= 1
fz;
&O(Z)
= e’;
5.4 GENERATING FUNCTIONS 201
this explains why the series with parameter t are called “generalized” bino-
mials and exponentials.
The following pairs of identities are valid for all real r:
CBS,(z)’
=
x
(tk;
‘)
g-+zk;
k20
(5.60)
B,(zlr
1 -t+tcBt(z)
'
Et(z)’
1
-z&(z)
=
t
k,
(tk+dkzk
.
(5.61)
k?O
(When tk + r = 0, we have to be a little careful about how the coefficient
of
zk
is interpreted; each coefficient is a polynomial in r. For example, the
constant term of E,(z)~ is
r(0
+
r)-',
and this is equal to 1 even when r = 0.)
Since equations (5.60) and (5.61) hold for all r, we get very general iden-
tities when we multiply together the series that correspond to different powers
r and s. For example,
%(Zlr
%(zlS
1 -t+tBBt(z)
'
=
t
("l')
&,k
t
('j
:
s)zj
k20
=
gng
(‘“;r)-&)n;krs).
/ /
This power series must equal
IBt(Z)‘+S
1
-t+tt’B,(z)-’
=
n>O
EC
tn+r+s n,
n
1
/
hence we can equate coefficients of zn and get the identity
(
t(:lkjiis)
tk&
=
(tn,.+s)
,
integer n,
valid for all real r, s, and t. When t = 0 this identity reduces to Vander-
monde’s convolution. (If by chance tk + r happens to equal zero in this
formula, the denominator factor tk + r should be considered to cancel with
the tk+r in the numerator of the binomial coefficient. Both sides of the iden-
tity are polynomials in r, s, and t.) Similar identities hold when we multiply
‘B,(z)’ by ‘B,(z)‘, etc.; Table 202 presents the results.
202 BINOMIAL COEFFICIENTS
Table 202 General convolution identities, valid for integer n 3 0.
= (tn+ r+s)ntnT++rS+S.
(5.65)
(5.62)
(5.63)
(5.64)
We have learned that it’s generally a good idea to look at special cases of
general results. What happens, for example, if we set t = l? The generalized
binomial
‘BI
(z) is very simple-it’s just
B,(z)
=
X2”
=
&;
k>O
therefore
IB1
(z) doesn’t give us anything we didn’t already know from
Van-
dermonde’s convolution. But El (z) is an important function,
&(z)
=
x(k+,)k-l;
=
l+z+;~~+$r~+$~+...
(5.66)
k>O
that we haven’t seen before; it satisfies the basic identity Ah!
This is the
iterated power
&(z)
= ,=Q)
function
(5.67)
E(1n.z)
=
zLz’.
that I’ve often
This function, first studied by Eisenstein
[75],
arises in many applications.
wondered
about.
The special cases t = 2 and t = -1 of the generalized binomial are of
zztrzr,,
particular interest, because their coefficients occur again and again in prob-
lems that have a recursive structure. Therefore it’s useful to display these
5.4 GENERATING FUNCTIONS
2~1
series explicitly for future reference:
=
.qy)&
=
1-y.
k
(5.68)
(5%)
(5.70)
(5.71)
(5.72)
(5.73)
The coefficients
(y)
$
of
BZ
(z) are called the Catalan numbers C,, because
Eugene Catalan wrote an influential paper about them in the 1830s
[46].
The
sequence begins as follows:
n 0’2345
6 7 8 9
10
G
1 1 2 5 14 42
‘32 429 ‘430
4862 ‘6796
The coefficients of B-1 (z) are essentially the same, but there’s an extra 1 at the
beginning and the other numbers alternate in sign: (1,
1,
-1,2,
-5,14,.
. .
).
Thus
BP1
(z) = 1 +
zBz(-z).
We also have
!B
1 (z) = %2(-z)
‘.
Let’s ClOSe this section by deriving an important consequence of (5.72)
and (5.73), a relation that shows further connections between the functions
L!L, (z) and ‘Bz(-z):
B-1
(z)n+’
-
(-Z)n+‘B~(-Z)n+’
VTFG
=
x
(yk)z,
k<n
204 BINOMIAL COEFFICIENTS
This holds because the coefficient of
zk
in
(-z)“+“B2(-~)“~‘/~~
is
=
(-,)n+l[Zk
n-11
=
(-1
)n+l(-,
)km
n 1
[Zkmnpl]
B2(Z)n+’
dixz
= (-1y
2(k-n-l)+n+l
k--n- 1
=
(-l)k
r;I;I-;)
=
(-,)k('"-;-')
n-k
=(
)
k
=
,z”,
%-I
(Z)n+’
JiTz
when k > n. The terms nicely cancel each other out. We can now use (5.68)
and (5.69) to obtain the closed form
integer n > 0.
(5.74)
(The special case
z
= -1 came up in Problem 3 of Section 5.2. Since the
numbers
$(l
f
G)
are sixth roots of unity, the sums
tks,,
(“ik)(-l)k
have the periodic behavior we observed in that problem.) Similarly we can
combine (5.70) with (5.71) to cancel the large coefficients and get
(l+yG)‘+(l-ywz)y
integer n > 0.
(5.75)
5.5
HYPERGEOMETRIC FUNCTIONS
The methods we’ve been applying to binomial coefficients are very
effective, when they work, but we must admit that they often appear to be
ad hoc-more like tricks than techniques. When we’re working on a problem,
we often have many directions to pursue, and we might find ourselves going
They’re even more
around in circles. Binomial coefficients are like chameleons, changing their
versatile than
appearance easily. Therefore it’s natural to ask if there isn’t some unifying
chameleons; we
can dissect them
principle that will systematically handle a great variety of binomial coefficient
and
put
them
summations all at once. Fortunately, the answer is yes. The unifying principle
back together in
is based on the theory of certain infinite sums called hypergeometric series.
different ways.
5.5 HYPERGEOMETRIC FUNCTIONS 205
The study of hypergeometric series was launched many years ago by Eu-
ler, Gauss, and Riemann; such series, in fact, are still the subject of consid-
erable research. But hypergeometrics have a somewhat formidable notation,
Anything that has
which takes a little time to get used to.
survived for cen-
turies with such
The general hypergeometric series is a power series in
z
with m + n
awesome
notation
parameters, and it is defined as follows in terms of rising factorial powers:
must be really
useful.
(
al,
..',
aIlI
1)
5
i;
i; k
F
a’
...am
4.
bl,
.-.,bn
=
k>O
by.
. .
bi
k!
(5.76)
To avoid division by zero, none of the b’s may be zero or a negative integer.
Other than that, the a’s and b’s may be anything we like. The notation
‘F(al,.
. . ,a,,,;
bl,.
.
. ,
b,;
z)’
is also used as an alternative to the two-line form
(5.76),
since a one-line form sometimes works better typographically. The a’s
are said to be upper parameters; they occur in the numerator of the terms
of F. The b’s are lower parameters, and they occur in the denominator. The
final quantity
z
is called the argument.
Standard reference books often use
,,,F,’ instead of ‘F’ as the name of a
hypergeometric with m upper parameters and n lower parameters. But the
extra subscripts tend to clutter up the formulas and waste our time, if we’re
compelled to write them over and over. We can count how many parameters
there are, so we usually don’t need extra additional unnecessary redundancy.
Many important functions occur as special cases of the general hypergeo-
metric; indeed, that’s why hypergeometrics are so powerful. For example, the
simplest case occurs when m = n = 0: There are no parameters at all, and
we get the familiar series
F(
1~)
=
&$
= e’.
Actually the notation looks a bit unsettling when m or n is zero. We can add
an extra ‘1’ above and below in order to avoid this:
In general we don’t change the function if we cancel a parameter that occurs
in both numerator and denominator, or if we insert two identical parameters.
The next simplest case has m = 1, al = 1, and n = 0; we change the
parameterstom=2,
al
=al=l,
n=l,andbl
=l,sothatn>O.
This
series also turns out to be familiar, because
1’
= k!:
206 BINOMIAL COEFFICIENTS
It’s our old friend, the geometric series; F( a’, . . . , a,,,; b’ , . . . ,
b,;
z) is called
hypergeometric because it includes the geometric series F(
1,l;
1; z) as a very
special case.
The general case m = 1 and n = 0 is, in fact, easy to sum in closed form,
F
=
La';
=
~(a'~p')zk
_
'
(1
-z)(l
(5.77)
k20
'
k
using (5.56). If we replace a by -a and
z
by
-2,
we get the binomial theorem,
F(-4
1-z)
= (l+z)"
A negative integer as upper parameter causes the infinite series to become
finite, since (-a)” = 0 whenever k > a 3 0 and a is an integer.
The general case m = 0, n = 1 is another famous series, but it’s not as
well known in the literature of discrete mathematics:
F
(5.78)
This function
I’,
’ is called a “modified Bessel function” of order b
-
1. The
special case b = 1 gives us F( ,‘, lz) =
10(2&),
which is the interesting series
t
k20
zk/k!‘.
The special case m = n = 1 is called a “confluent hypergeometric series”
and often denoted by the letter M:
ak zk
=
&
-
=
M(a,b,z)
k>O
bk
k!
/
(5.79)
This function, which has important applications to engineering, was intro-
duced by Ernst Kummer.
By now a few of us are wondering why we haven’t discussed convergence
of the infinite series (5.76). The answer is that we can ignore convergence if
we are using
z
simply as a formal symbol. It is not difficult to verify that
formal infinite sums of the form
tk3,,
(Xkzk
form a field, if the coefficients
ak lie in a field. We can add, subtract, multiply, divide, differentiate, and do
functional composition on such formal sums without worrying about conver-
gence; any identities we derive will still be formally true. For example, the
hypergeometric F( “i
,’
/z) = tkZO k!
zk
doesn’t converge for any nonzero z;
yet we’ll see in Chapter 7 that we can still use it to solve problems. On the
other hand, whenever we replace z by a particular numerical value, we do
have to be sure that the infinite sum is well defined.
5.5 HYPERGEOMETRIC FUNCTIONS 207
“There must
be
many universities
to-day
where 95
per cent, if not
100 per cent, of the
functions studied by
physics, engineering,
and even mathe-
matics
students,
are covered by
this single symbol
F(a,b;c;x).”
-
W. W. Sawyer[257]
The next step up in complication is actually the most famous
hypergeo-
metric of all. In fact, it was the hypergeometric series until about 1870, when
everything was generalized to arbitrary m and n. This one has two upper
parameters and one lower parameter:
--
a,b
(
1)
akbk
zk
F
/=t---.
k>O
ci;k!
(5.80)
It is often called the Gaussian hypergeometric, because many of its subtle
properties were first proved by Gauss in his doctoral dissertation of 1812
[116],
although Euler
[95]
and Pfaff
12331
had already discovered some remarkable
things about it. One of its important special cases is
k! k!
(-z)~
=
.zt-----
k>O
(k+
l)!
k!
,
22
23
z4
=
z--+--T+“’
2 3
Notice that
ZC’
ln( 1 +z) is a hypergeometric function, but ln( 1 +z) itself cannot
be hypergeometric, since a hypergeometric series always has the value 1 when
z
:=
0.
So far hypergeometrics haven’t actually done anything for us except pro-
vide an excuse for name-dropping. But we’ve seen that several very different
functions can all be regarded as hypergeometric; this will be the main point of
interest in what follows. We’ll see that a large class of sums can be written as
hypergeometric series in a “canonical” way, hence we will have a good filing
system for facts about binomial coefficients.
What series are hypergeometric? It’s easy to answer this question if we
look at the ratio between consecutive terms:
The first term is
to
=
1,
and the other terms have ratios given by
-
_
fk+l
k+l
a,
k+l
. ..a.
b:...bf:
k!
Zk+l
-=
_____
fk
T;
1
al . ..a.
bki’
, . . .
bk,+‘(k+l)!
zk
(k+al)...(k+a,)z
=
(k+bl)...(k+b,)(k+l)’
This is a rational function of k, that is, a quotient of polynomials in k. Any
rational function of k can be factored over the complex numbers and put
208 BINOMIAL COEFFICIENTS
into this form. The a’s are the negatives of the roots of the polynomial in
the numerator, and the b’s are the negatives of the roots of the polynomial
in the denominator. If the denominator doesn’t already contain the special
factor (k + 1
),
we can include (k + 1) in both numerator and denominator. A
constant factor remains, and we can call it
z.
Therefore hypergeometric series
are precisely those series whose first term is 1 and whose term ratio tk+l/tk
is a rational function of k.
Suppose, for example, that we’re given an infinite series with term ratio
tk+
1
k2+7k+10
-
=
tk
4k2
+
1
a rational function of k. The numerator polynomial splits nicely into two
factors, (k + 2) (k +
5),
and the denominator is 4(k + i/2) (k
-
i/2). Since the
denominator is missing the required factor
(kf
l), we write the term ratio as
tk+
1
(k+2)(k+5)(k+
1)(1/4)
-
=
fk
(k+i/2)(k-i/2)(k+
1)
and we can read off the results: The given series is
ix
k>O
tk
=
toF(i;,?;2/V4).
Thus, we have a general method for finding the hypergeometric represen-
tation of a given quantity S, when such a representation is possible: First we
write S as an infinite series whose first term is
nonzero.
We choose a notation
so that the series is
t
k20
tk
with
to
# 0. Then we
Cahhte
tk+l/tk. If the (NOW is a good
term ratio is not a rational function of k, we’re out of luck. Otherwise we
time
to
do
warmuP
express it in the form (5.81); this gives parameters al, . . . , a,, br, . . . , b,,
exercise 11.)
and an argument z, such that S =
to
F(
al,.
. . , a,,,; br , . . . ,
b,;
z).
Gauss’s hypergeometric series can be written in the recursively factored
form
a+2
b+2
--z(1
+...)
3 c-t2
)>
if we wish to emphasize the importance of term ratios.
Let’s try now to reformulate the binomial coefficient identities derived
earlier in this chapter, expressing them as hypergeometrics. For example,
let’s figure out what the parallel summation law,
&(‘i”>
=
(r,,+‘),
integern,
5.5 HYPERGEOMETRIC FUNCTIONS 209
looks like in hypergeometric notation. We need to write the sum as an infinite
series that starts at k = 0, so we replace k by n
-
k:
r+n-k
E
n-k
x
(r+n-k)!
=
tk
k,O
r!
(n
-
k)!
x
.
/
k>O
This series is formally infinite but actually finite, because the (n
-
k)! in the
denominator will make
tk
= 0 when k > n. (We’ll see later that l/x! is
defined for all x, and that l/x! = 0 when x is a negative integer. But for now,
let’s blithely disregard such technicalities until we gain more hypergeometric
experience.) The term ratio is
tk+l
(r+n-k-l)!r!(n-k)!
n-k
-
= r!(n-k-l)!(r+n-k)! =
r+n-k
tk
(k+
l)(k-n)(l)
= (k-n-r)(k+ 1)
Furthermore
to
= (“,“). Hence the parallel summation law is equivalent to
the hypergeometric identity
("n")r(:l+il)
=
(r+,,').
Dividing through by (“,“) g’
Ives
a slightly simpler version,
(5.82)
Let’s do another one. The term ratio of identity
(5.16),
integer
m,
is
(k-m)/(r-m+k+l)
=(k+l)(k-m)(l)/(k-m+r+l)(k+l),
after
we replace k by m
-
k; hence (5.16) gives a closed form for
This is essentially the same as the hypergeometric function on the left of
(5.82),
but with m in place of n and r + 1 in place of
-r.
Therefore identity
(5.16) could have been derived from (5.82), the hypergeometric version of
(5.9). (No wonder we found it easy to prove (5.16) by using (5.g).)
First derangements,
Before we go further, we should think about degenerate cases, because
now degenerates.
hypergeometrics are not defined when a lower parameter is zero or a negative
210 BINOMIAL COEFFICIENTS
integer. We usually apply the parallel summation identity when
r
and n are
positive integers; but then
-n--r
is a negative integer and the hypergeometric
(5.76) is undefined. How
th.en
can we consider (5.82) to be legitimate? The
answer is that we can take the limit of F(
Pr,{TFE
11)
as
e
+
0.
We will look at such things more closely later in this chapter, but for now
let’s just be aware that some denominators can be dynamite. It is interesting,
however, that the very first sum we’ve tried to express hypergeometrically
has turned out to be degenerate.
Another possibly sore point in our derivation of (5.82) is that we ex-
panded (“‘,“i”) as (r + n
-
k)!/r!
(n
-
k)!. This expansion fails when r is a
negative integer, because (--m)! has to be m if the law
O! =
O.(-l).(-2)...:(-m+l).(-m)!
is going to hold. Again, we need to approach integer results by considering a
limit of
r
+
E
as
c
-4 0.
But we defined the factorial representation
(L)
=
r!/k!
(r-k)! only when
r
is an integer! If we want to work effectively with hypergeometrics, we need
a factorial function that is defined for all complex numbers. Fortunately there
is such a function, and it can be defined in many ways. Here’s one of the most
useful definitions of z!, actually a definition of 1 /z! :
1
-
=
lim
n
+’
n
‘.
2.
n-03
(
)
n
(5.83)
(See exercise 21. Euler
[81]
discovered this when he was 22 years old.) The
limit can be shown to exist for all complex z, and it is zero only when
z
is a
negative integer. Another significant definition is
z! =
r
t’e
t
dt , if
312
> -1.
0
This integral exists only when the real part of z exceeds -1, but we can use
the formula
z! = z(z-l)!
(5.85)
to extend (5.84) to all complex z (except negative integers). Still another
definition comes from Stirl:ing’s interpolation of lnz! in (5.47). All of these
approaches lead to the same generalized factorial function.
There’s a very similar function called the Gamma function, which re-
lates to ordinary factorials somewhat as rising powers relate to falling powers.
Standard reference books often use factorials and Gamma functions simulta-
neously, and it’s convenient to convert between them if necessary using the
(We proved the
identities originally
for integer
r,
and
used the polynomial
argument to show
that they hold in
general. Now we’re
proving them first
for irrational
r,
and using a limiting
argument to show
that they ho/d for
integers!)
5.5 HYPERGEOMETRIC FUNCTIONS 211
following formulas:
How do you write
2
to the
W
power,
when W is the
complex conjugate
of
w
?
pl
I see, the lower
index
arrives at
its limit first.
That’s why
(;)
is zero when
w
is
a negative integer.
T(z+l) =
z!;
(5.86)
(-z)!
T(z)
=
-T-.
sin
712
(5.87)
We can use these generalized factorials to define generalized factorial
powers, when
z
and w are arbitrary complex numbers:
+=
z!
.
(z-w)!
w=
ryz
+
w)
z
r(z)
.
The only proviso is that we must use appropriate limiting values when these
formulas give
CXI/OO.
(The formulas never give O/O, because factorials and
Gamma-function values are never zero.) A binomial coefficient can be written
z
0
= lim lim
L!
W L-+2 w-w w! (<
-
w)!
(5.90)
when
z
and w are any complex numbers whatever.
Armed with generalized factorial tools, we can return to our goal of re-
ducing the identities derived earlier to their hypergeometric essences. The
binomial theorem (5.13) turns out to be neither more nor less than (5.77),
as we might expect. So the next most interesting identity to try is Vander-
monde’s convolution (5.27):
$)(n”k)
=
(‘i”)~
integer n.
The kth term here is
T!
s!
tk
= (r-k)!k! (s-n+k)!(n-k)!
and we are no longer too shy to use generalized factorials in these expres-
sions. Whenever
tk
contains a factor like
(LX
+ k)!, with a plus sign before
the k, we get (o1+ k +
l)!/(a
+ k)! = k + a + 1 in the term ratio
tk+j/tk,
by (5.85); this contributes the parameter
‘a+
1’ to the corresponding hyper-
geometric-as an upper parameter if (
cx
+ k)! was in the numerator of tk,
but as a lower parameter otherwise. Similarly, a factor like
(LX
-
k)! leads to
(a
-
k
-
l)!/(a
-
k)! =
(-l)/(k
-
a); this contributes ‘-a’ to the opposite
set of parameters (reversing the roles of upper and lower), and negates the
hypergeometric argument. Factors like r!, which are independent of k, go
212 BINOMIAL COEFFICIENTS
into
to
but disappear from
t,he
term ratio. Using such tricks we can predict
without further calculation
t;hat
the term ratio of (5.27) is
tk+l
k-r k
-
n
-=-
fk
k+l
k+s-n+l
times
(--1
)’ =
1,
and Vandermonde’s convolution becomes
(5.91)
We can use this equation to determine F( a, b; c; z) in general, when z = 1 and
when b is a negative integer.
Let’s rewrite (5.91) in a form so that table lookup is easy when a new
sum needs to be evaluated. The result turns out to be
F
a,b
(
1)
, _
T(c-a--b)T(c)
C
r(c
-
a)
T(c
-
b)
integer b 6 0
or
%c
>Ra+!Xb.
(5.92)
Vandermonde’s convolution (5.27) covers only the case that one of the upper
parameters, say b, is a nonpositive integer; but Gauss proved that (5.92) is
A few weeks ago, we
valid also when a, b, c are complex numbers whose real parts satisfy
!Xc
>
were studying what
%a +
%b.
In other cases, the infinite series F(
“;”
j 1) doesn’t converge. When
b =
-n,
the identity can be written more conveniently with factorial powers
~~~r~~r~e~e
jn
Now
we’re
studying
instead of Gamma functions:
stuff beyond his
Ph.D. thesis.
F(a’;ni,)
=
k&z
=
(;-;s,
integer n > 0.
(5.93)
Is
this intimidating
or what?
It turns out that all five of the identities in Table 169 are special cases of
Vandermonde’s convolution; formula (5.93) covers them all, when proper at-
tention is paid to degenerate situations.
Notice that (5.82) is just the special case a = 1 of (5.93). Therefore we
don’t really need to remember (5.82); and we don’t really need the identity
(5.9) that led us to (5.82), even though Table 174 said that it was memo-
rable. A computer program for formula manipulation, faced with the prob-
lem of evaluating
xkGn
(
‘+kk),
could convert the sum to a hypergeometric and
plug into the general identity for Vandermonde’s convolution.
Problem 1 in Section 5.2 asked for the value of
This problem is a natural for hypergeometrics, and after a bit of practice any
hypergeometer can read off the parameters immediately as F(
1,
-m;
-n;
1).
Hmmm; that problem was yet another special takeoff on Vandermonde!
5.5 HYPERGEOMETRIC FUNCTIONS 213
The sum in Problem 2 and Problem 4 likewise yields F( 2,1
-
n; 2
-
m; 1).
(We need to replace k by k + 1 first.) And the “menacing” sum in Problem 6
turns out to be just F(n +
1,
-n;
2; 1). Is there nothing more to sum, besides
disguised versions of Vandermonde’s powerful convolution?
Well, yes, Problem 3 is a bit different. It deals with a special case of the
general sum
tk
(“kk) zk considered in (5.74), and this leads to a closed-form
expression for
We also proved something new in (5.55), when we looked at the coeffi-
cients of (1
-
z)~(
1 +
z)~:
F
l-c-2n,
-2n
(
C
1
>
(2n)!
-1
=
(-l)n-
(c
-
1 )!
n! (c+n-l)!’
integer n 3 0.
Kummer was a
summer.
This is called Kummer’s formula when it’s generalized to complex numbers:
(5.94)
The summer of ‘36.
(Ernst Kummer
[187]
proved this in 1836.)
It’s interesting to compare these two formulas. Replacing c by
l -2n-
a,
we find that the results are consistent if and only if
(5.95)
when n is a positive integer. Suppose, for example, that n = 3; then we
should have
-6!/3!
= lim
X+
3x!/(2x)!.
We know that
(-3)!
and
(-6)!
are
both infinite; but we might choose to ignore that difficulty and to imagine
that
(-3)!
=
(-3)(-4)(-5)(-6)!,
so that the two occurrences of
(-6)!
will
cancel. Such temptations must, however, be resisted, because they lead to
the wrong answer! The limit of x!/(2x)! as x
+
-3 is not (-3) (-4) (-5) but
rather
-6!/3!
=
(-4)(-5)(-6),
according to (5.95).
The right way to evaluate the limit in (5.95) is to use equation (5.87),
which relates negative-argument factorials to positive-argument Gamma func-
tions. If we replace x by -n +
e
and let
e
+
0, two applications of (5.87)
give
(-n-e)!
F(n+e)
sin(2n + 2e)rt
(-2n
-
2e)! F(2n +
2e)
=
sin(n + e)rc
214 BINOMIAL COEFFICIENTS
Now sin( x + y ) = sin x cos y + cos x sin y
;
so this ratio of sines is
cos
2n7t
sin
2~
cos n7t sin
c7r
=
(-qn(2
+ O(e)) ,
by the methods of Chapter 9. Therefore, by (5.86), we have
!‘_mo
(-2n
-
2e)!
(-n-4!
=
2(-l),r(2n)
=
,(-,),P-l)!
n
Vn)!
r(n)
(n-l)!
=
(-‘)
7’
as desired.
Let’s complete our survey by restating the other identities we’ve seen so
far in this chapter, clothing them in hypergeometric garb. The triple-binomial
sum in (5.29) can be written
F
1
--a-2n,
1
-b-211,
-2n ,
a,
b
1)
(2n)!
(a+b+2n-2)”
=
(-l)nn!-
ak’,‘i
integer n 3 0.
When this one is generalized to complex numbers, it is called Dixon’s for-
mula:
F
a,
b,
c
= (c/2)!
(c-a)*(c-b)*
1 fc-a, 1
fc-b
,
c!
(c-a-b)*
b6)
fla+Rb
<
1 +Rc/2.
One of the most general formulas we’ve encountered is the triple-binomial
sum (5.28), which yields Saalschiitz’s identity:
F
a, b,
--n
=
(c-a)K(c-b)”
c,
afb-c-n+1
c”(c-a-b)K
(a
-
c)n (b
-
c)E
=
(-c)s(a+b-c)n’
integer n 3 0.
This formula gives the value at
z
= 1 of the general hypergeometric series
with three upper parameters and two lower parameters, provided that one
of the upper parameters is a nonpositive integer and that
bl
+
bz
=
al
+
a2
+
a3
+ 1. (If the sum of the lower parameters exceeds the sum of the
upper parameters by 2 instead of by
1,
the formula of exercise 25 can be used
to express F(al , a2, as;
bl
, b2; 1) in terms of two hypergeometrics that satisfy
Saalschiitz’s identity.)
Our hard-won identity in Problem 8 of Section 5.2 reduces to
1
---F
(
x+1,
n+l,
-n
1+x
1,
x+2
1)
1
=
(-‘)nX”X-n=l.
5.5 HYPERGEOMETRIC FUNCTIONS 215
(Historical note:
The great relevance
of hypergeometric
series to binomial
coefficient identities
was first pointed
out by George
Andrews in 1974
/9,
section
51.)
Sigh. This is just the special case c = 1 of Saalschiitz’s identity (5.g7), so we
could have saved a lot of work by going to hypergeometrics directly!
What about Problem 7? That extra-menacing sum gives us the formula
F
(
n+l,
m-n, 1,
t
tm+l,
tm+$,
2
1)
1
=12
n’
which is the first case we’ve seen with three lower parameters. So it looks
new. But it really isn’t; the left-hand side can be replaced by
F
(
n, m-n-l,
-t
tm,
trn-;
1)
1 -1,
using exercise 26, and Saalschiitz’s identity wins again.
Well, that’s another deflating experience, but it’s also another reason to
appreciate the power of hypergeometric methods.
The convolution identities in Table 202 do not have hypergeometric
equivalents, because their term ratios are rational functions of k only when
t is an integer. Equations (5.64) and (5.65) aren’t hypergeometric even when
t = 1. But we can take note of what (5.62) tells us when t has small integer
values:
F
(
,~;q-~~;Jl)
=
f-+,2")/("+nZn);
F
(
$r,
;r+;, fr+$,
-n,
-n-is,
-n-is-i
;r+;,
;
r+l,
-n--is,
-n-is+;,
-n-$.5+5
1)
1
The first of these formulas gives the result of Problem 7 again, when the
quantities (r, s,n) are replaced respectively by
(1,2n
+ 1
-
m, -1
-
n).
Finally, the “unexpected” sum (5.20) gives us an unexpected
hypergeo-
metric identity that turns out to be quite instructive. Let’s look at it in slow
motion. First we convert to an infinite sum,
q32-k
= 2”
H
k$m
The term ratio from (2m
-
k)! 2k/m! (m
-
k)! is 2(k
-
m)/(k
-
2m), so we
have a hypergeometric identity with
z
= 2:
(2mm)F(‘~~~l2)
= 22m,
integerm>O.
(5.98)
216 BINOMIAL COEFFICIENTS
But look at the lower parameter
‘-
2m’.
Negative integers are verboten, so
this identity is undefined!
It’s high time to look at such limiting cases carefully, as promised earlier,
because degenerate hypergeometrics can often be evaluated by approaching
them from nearby nondegenerate points. We must be careful when we do this,
because different results can be obtained if we take limits in different ways.
For example, here are two limits that turn out to be quite different when one
of the upper parameters is increased by
c:
hFO
F
-lSE, -3
-2+e
-=
a,,(l
+ (4;;k;i + (--1+4(4-3)(-2)
(--2+El(-l+EI2!
+
(-l+~l(~)(l+~l(
-3)1-2)(-l)
(-2+E)(-l+E)(E)3!
)
FzF(I:';zll)
:=
lii(l+#$+O+O)
:=
q+o+o
zz
-;
Similarly, we have defined (1;) = 0 = lime-c
(-2’)
;
this is not the same
as
lime.+7
(1;::)
= 1. The proper way to treat
(5.98)
as a limit is to realize
that the upper parameter -m is being used to make all terms of the series
tkaO
(2c:kk)2k zero for k
>
m; this means that we want to make the following
more precise statement:
(2mm)
liiF(y2;,“,12)
=
22m,
integerm>O.
(5.99)
Each term of this limit is well defined, because the denominator factor
(-2m)’
does not become zero until k. > 2m. Therefore this limit gives us exactly the
sum (5.20) we began with.
5.6 HYPERGEOMETRIC TRANSFORMATIONS
It should be clear by now that a database of known hypergeometric
closed forms is a useful tool for doing sums of binomial coefficients. We
simply convert any given sum into its canonical hypergeometric form, then
look it up in the table. If it’s there, fine, we’ve got the answer. If not, we can
add it to the database if the sum turns out to be expressible in closed form.
We might also include entries in the table that say, “This sum does not have a
simple closed form in general.” For example, the sum xkSrn
(L)
corresponds
5.6 HYPERGEOMETRIC TRANSFORMATIONS 217
to the hypergeometric
(~)(A2
l-1))
integers n 3 m 3 0;
The
hypergeo-
metric database
should really be a
“knowledge base.”
this has a simple closed form only if m is near 0,
in,
or n.
But there’s more to the story, since hypergeometric functions also obey
identities of their own. This means that every closed form for hypergeometrics
leads to additional closed forms and to additional entries in the database. For
example, the identities in exercises 25 and 26 tell us how to transform one
hypergeometric into two others with similar but different parameters. These
can in turn be transformed again.
In 1793, J. F. PfafI discovered a surprising reflection law,
&F(a’cbl+)
=
F(a’;-blz),
(5.101)
which is a transformation of another type. This is a formal identity in
power series, if the quantity
(-z)“/(
1
-
z)~+~
is replaced by the infinite series
(--z)k(l
+ (":")z+
(k+;+'
)
z2
+.
. .) when the left-hand side is expanded (see
exercise 50). We can use this law to derive new formulas from the identities
we already know, when
z
# 1.
For example, Kummer’s formula (5.94) can be combined with the reflec-
tion law (5.101) if we choose the parameters so that both identities apply:
=
k$$b-a)~,
(5.102)
We can now set a = -n and go back from this equation to a new identity in
binomial coefficients that we might need some day:
=
2-,,
(b/4!
(b+n)!
b! (b/2+n)!
integer n 3 0. (5.103)
For example, when n = 3 this identity says that
4
l-3-
+3
4.5
4.5.6
2(4 + b) 4(4 + b) (5 +
b)
-
8(4 + b)(5 + b)(6 + b)
(b+3)(b+2)(b+l)
=
(b+6)(b+4)(b+2)
218 BINOMIAL COEFFICIENTS
It’s almost unbelievable, but true, for all b. (Except when a factor in the
denominator vanishes.)
This is fun; let’s try again. Maybe we’ll find a formula that will really
astonish our friends. What
Idoes
Pfaff’s reflection law tell us if we apply it to
the strange form (s.gg), where
z
= 2? In this case we set a =
-m,
b = 1,
and c =
-2mf
e,
obtaining
lim
x
(-m)“(-2m-
1 + e)”
2k
=
E'O
k>O
(-2m
+ c)k
ii
because none of the limiting terms is close to zero. This leads to another
miraculous formula,
(-2)k
=
(-,yy2,
-l/2
=l/(
>
m
integer m 3 0.
(5.104)
When m = 3, for example, the sum is
and
(-y2)
is indeed equal to
-&.
When we looked at our binomial coefficient identities and converted them
to hypergeometric form, we overlooked (5.19) because it was a relation be-
tween two sums instead of a closed form. But now we can regard (5.19) as
an identity between hypergeometric series. If we differentiate it n times with
respect to y and then replace k by m
-
n
-
k, we get
EC
m+r n+k
k>O
m-n-k
)(
)
X
m-n-k
k
n
Y
/
-r
nfk
=
m-n-k
>(
>
n (-X)m-n-k(X +
y)k.
This yields the following hypergeometric transformation:
F
a, -n
(
1)
2.
(a-c:)“F
=--
C
(-cp
(
a, -n
integer
1 -n+a-c
1
)
‘-’
n>O
(5.105)
/
.
5.6 HYPERGEOMETRIC TRANSFORMATIONS 219
Notice that when
z
= 1 this reduces to Vandermonde’s convolution, (5.93).
Differentiation seems to be useful, if this example is any indication; we
also found it helpful in Chapter 2, when summing x + 2x2 + . . . + nxn. Let’s
see what happens when a general hypergeometric series is differentiated with
respect to
2:
al
(al+l)i;.
. .
a,(a,+l)kzk
=
2
b
(b,+l)“...b
1
(b
n n
+l)kk!
al . . . a,
bl
.
..b.
F
(5.10’3)
The parameters move out and shift up.
How do you pro-
It’s also possible to use differentiation to tweak just one of the parameters
while holding the rest of them fixed. For this we use the operator
flounce
4 ?
(Dunno,
but
7j$
calls it ?artheta’.)
which acts on a function by differentiating it and then
operator gives
multiplying by z. This
which by itself isn’t too useful. But if we multiply
parameters, say al, and add 4F, we get
F by one of its upper
al(al+l)‘ak...akzk
=
by.J&,
k?O n .
=
alF
al+l,
a2,
. . . . a,
bl,
. . . . b,
Only one parameter has been shifted.
220 BINOMIAL COEFFICIENTS
A similar trick works with lower parameters, but in this case things shift
down instead of up:
x
(bl
-
1)
a!.
. .
c&
zk
=
k>O
(b,
-l)i;bi...b;k!
We can now combine all these operations and make a mathematical “pun”
Ever hear the one
by expressing the same quantity in two different ways. Namely, we have
about the brothers
who named their
(9+a,)...(4+a,)F
q =
al...a,F
altl,
. . . .
a,+1
cattle ranch Focus,
because it’s where
bl,
. . . .
b,
the
sons raise meat?
and
(8
+
b,
-
1). . . (4 +
b,
-- l)F
==
(bl-l)...(bn-1)F
,,“I”“‘~+),
I
...I
n
where F = F(al , . . . , a,;
bl
, . . .
, b,;z). And (5.106) tells us that the top line
is the derivative of the bottom line. Therefore the general hypergeometric
function F satisfies the differential equation
D(9 +
bl
-
1). . .
(9
+
b,,
-
l)F = (4 + al). . .
(9
+
a,)F,
(5.107)
where D is the operator
2.
This cries out for an example. Let’s find the differential equation satisfied
by the standard a-over-1 hypergeometric series F(z) = F(a, b; c;
z).
According
to
(5.107),
we have
D(9+c-1)F
=
(i?+a)(4+b)F.
What does this mean in ordinary
notation
?
Well, (4 + c
-
l)F is zF’(z) +
(c
-
1 )F(z), and the derivative of this gives the left-hand side,
F’(z) + zF”(z) + (c
-
l)F’(z)
.
5.6 HYPERGEOMETRIC TRANSFORMATIONS 221
On the right-hand side we have
(B+a)(zF’(z)+bF(z))
=
zi(zF’(z)+bF(z))
+
a(tF’(z)+bF(z))
=
zF’(z)+z’F”(z)+bzF’(z)+azF’(z)+abF(z).
Equating the two sides tells us that
~(1
-z)F”(z)+
(c-z(a+b+l))F’(z)
-abF(z)
= 0.
(5.108)
This equation is equivalent to the factored form (5.107).
Conversely, we can go back from the differential equation to the power
series. Let’s assume that F(z) =
t
kaO
tkzk is a power series satisfying (5.107).
A straightforward calculation shows that we must have
tk+l
(k+al)...(k+a,)
~
=
(k+b,)...(k+b,)(k+l)’
tk
hence F(z) must be
to
F(al, . . . , a,,,;
bl,.
. . ,
b,;
z). We’ve proved that the
hypergeometric series (5.76) is the only formal power series that satisfies the
differential equation (5.107) and has the constant term 1.
It would be nice if hypergeometrics solved all the world’s differential
equations, but they don’t quite. The right-hand side of (5.107) always expands
into a sum of terms of the form
c%kzkFiki
(z), where
Flk’(z)
is the kth derivative
DkF(k); the left-hand side always expands into a sum of terms of the form
fikzk
‘Fikl(z)
with k > 0. So the differential equation (5.107) always takes
the special form
z”-‘(p,,
-zc~,JF(‘~(z)
+ . . . + ((3,
-
za,)F’(z)
-
ocoF(z) = 0.
The
function
F(z) = (1
-2)’
satisfies
8F = ~(4
-
r)F.
This
nives
another
proofYof the bino-
mial theorem.
Equation (5.108) illustrates this in the case n = 2. Conversely, we will prove
in exercise 6.13 that any differential equation of this form can be factored in
terms of the 4 operator, to give an equation like (5.107). So these are the dif-
ferential equations whose solutions are power series with rational term ratios.
Multiplying both sides of (5.107) by
z
dispenses with the D operator and
gives us an instructive all-4 form,
4(4 +
bl
-
1). . . (4 + b,
-
l)F = ~(8 + al). .
(8
+
a,)F.
The first factor 4 = (4+ 1
-
1) on the left corresponds to the (k+ 1) in the term
ratio (5.81), which corresponds to the k! in the denominator of the kth term
in a general hypergeometric series. The other factors (4 +
bi
-
1) correspond
to the denominator factor (k+ bi), which corresponds to b: in (5.76). On the
right, the
z
corresponds to zk, and (4 +
ai
) corresponds to af.
222 BINOMIAL COEFFICIENTS
One use of this differential theory is to find and prove new transforma-
tions. For example, we can readily verify that both of the hypergeometrics
satisfy the differential equation
~(1
-z)F"(z)
+
(afb
+-
;)(l
-2z)F'(z)
-4abF(z)
= 0;
hence Gauss’s identity [116, equation
1021
(5.110)
must be true. In particular,
ICaution:
We
can’t
use (5.110) safely
F(
,:4;:;
1;)
=
F(o+4;IT-11’)
2
whenever both infinite sums converge.
(5.111)
when
Izl
>
l/Z,
unless
both sides
are polynomials;
see
exercise 53.)
Every new identity for hypergeometrics has consequences for binomial
coefficients, and this one is no exception. Let’s consider the sum
&(m,k)(m+r+l)
(q)“,
integersm>n>O.
The terms are nonzero for 0 < k < m
-
n, and with a little delicate limit-
taking as before we can express this sum as the hypergeometric
liio
m
F
0
(
n-m,
-n-m-lfae
n
-m+
6
The value of
OL
doesn’t affect the limit, since the nonpositive upper parameter
n
-
m cuts the sum off early. We can set
OL
= 2, so that (5.111) applies.
The limit can now be evaluated because the right-hand side is a special case
of (5.92). The result can be expressed in simplified form,
gm,k)(m+,+l)
(G)
= ((m+nn1’2)2nPm[m+n is even],
~~~~o,
(5.112)
as shown in exercise 54. For example, when m = 5 and n = 2 we get
(z)(i)
-
($($/2
+
(:)(;)/4
--
(z)(i)/8
= 10
-
24 + 21
-
7
= 0; when m = 4
and n = 2, both sides give z.
5.6 HYPERGEOMETRIC TRANSFORMATIONS 223
We can also find cases where (5.110) gives binomial sums when z = -1,
but these are really weird. If we set a =
i
-
2
and b =
-n,
we get the
monstrous formula
These hypergeometrics are nondegenerate polynomials when n $ 2 (mod 3);
and the parameters have been cleverly chosen so that the left-hand side can
be evaluated by
(5.94).
We are therefore led to a truly mind-boggling result,
integer n 3 0, n
$2
(mod 3).
(5.113)
This is the most startling identity in binomial coefficients that we’ve seen.
Small cases of the identity aren’t even easy to check by hand. (It turns out
The only use of
(5.113)
is to demon-
that both sides do give
y
when n = 3.) But the identity is completely useless,
strate the existence
of course; surely it will never arise in a practical problem.
of
incredibly useless
So that’s our hype for hypergeometrics. We’ve seen that hypergeometric
identities.
series provide a high-level way to understand what’s going on in binomial
coefficient sums. A great deal of additional information can be found in the
classic book by Wilfred N. Bailey
[15]
and its sequel by Lucy Joan Slater
[269].
5.7
PARTIAL HYPERGEOMETRIC SUMS
Most of the sums we’ve evaluated in this chapter range over all in-
dices k 3 0, but sometimes we’ve been able to find a closed form that works
over a general range 0 6 k < m. For example, we know from (5.16) that
integer m.
(5.114)
The theory in Chapter 2 gives us a nice way to understand formulas like this:
If f(k) = Ag(k) = g(k + 1)
-
g(k), then we’ve agreed to write
t
f(k)
6k
=
g(k) + C, and
xbf(k)6k
= g(k)
I”,
= g(b)
-
g(a).
a
Furthermore, when a and b are integers with a < b, we have
tbf(k)Bk
=
x
f(k) = g(b)-g(a).
a
a<k<b
224 BINOMIAL COEFFICIENTS
Therefore identity (5.114) corresponds to the indefinite summation formula
(-l)%k
=
(-l)k-’
and to the difference formula
A((-lik(;))
= (-l)k+l
(;I;).
It’s easy to start with a function g(k) and to compute Ag(k) = f(k), a
function whose sum will be g(k) + C. But it’s much harder to start with f(k)
and to figure out its indefinite sum
x
f(k)
6k
= g(k) + C; this function g
might not have a simple form. For example, there is apparently no simple
form for
x
(E)
6k;
otherwise we could evaluate sums like xkSn,3
(z)
, about
which we’re clueless.
In 1977, R. W. Gosper
[124]
discovered a beautiful way to decide whether
a given function is indefinitely summable with respect to a general class of
functions called hypergeometric terms. Let us write
i;
i;
k
F
al,
.
.
.
,
am
b,,
. .
..b.,
1)
z =
a, . . . a,
5
k
by.
. .
bi
k!
(5.115)
for the kth term of the hypergeometric series F( al,. . . , a,,,; bl , . . . ,
b,;
z). We
will regard F(
al,.
. . , a,; bl , . . .
, b,;
z)k
as a function of k, not of
z.
Gosper’s
decision procedure allows us to decide if there exist parameters c,
Al,
. . . , AM,
BI, . . . .
BN,
and Z such that
al, . . . .
a,
b,,
.,.,
b,
AI,
. . . ,
AM
BI,
. . . , BN
(5.4
given al, . . . , a,,
bl,
. . . , b,, and
z.
We will say that a given function
F(al,.
. .
,am;b,,.
. .
, bn;z)k is summable in hypergeometric terms if such
constants C, Al, . . . ,
AM,
Bl, . . . ,
BN,
Z exist.
Let’s write t(k) and T(k) as abbreviations for
F(al
, . . . , a,,,;
bl,
. . . ,
b,;
z)k
and
F(A,,
. . . , AM;
B,,
. . .
,
BN;
Z)k,
respectively. The first step in Gosper’s
decision procedure is to express the term ratio
t(k+
1)
(k+al)...(k+a,)z
~
=
t(k)
(k+b,)...(k+b,)(k+l)
in the special form
t(k+ 1)
p(k+
1)
q(k)
-=-
0)
p(k)
r(k+
(5.117)
5.7 PARTIAL HYPERGEOMETRIC SUMS 225
(Divisibility ofpoly-
nomials is analogous
to divisibility of
integers. For exam-
ple,
(k
+ a)\q(kl
means that the quo-
tient q(k)/(k+
a)
is a polynomial.
It’s well known
that
(k
+
a)\q(k)
if
and only if
q(-or) = 0.)
where
p,
q, and
r
are polynomials subject to the following condition:
(k
+
a)\q(k)
and
(k
+
B)\r(k)
==+
a
-
/3
is not a positive integer.
(5.118)
This condition is easy to achieve: We start by provisionally setting p(k) =
1,
q(k)=(k+a,)...(k+a,)z,andr(k)=(k+bl-l)...(k+b,-l)k;then
we check if (5.118) is violated. If q and
r
have factors (k + a) and (k +
(3)
where a
-
(3
= N > 0, we divide them out of q and
r
and replace p(k) by
p(k)(k+oL-l)N-‘=
p(k)(k+a-l)(k+a-2)...(k+fi+l).
The new p, q, and
r
still satisfy
(5.117),
and we can repeat this process until
(5.118) holds.
Our goal is to find a hypergeometric term T(k) such that
t(k) = cT(k+ 1)
-CT(k)
(5.119)
for some constant c. Let’s write
CT(k) =
r(k) s(k) t(k)
p(k)
(5.120)
(Exercise 55 ex-
where s(k) is a secret function that must be discovered somehow. Plugging
plains why we might
want to make this
(
5.120) into (5.117) and (5.119) gives us the equation that s(k) must satisfy:
magic substitution.)
p(k) = q(k)s(k+ 1)
-r(k)s(k)
(5.121)
If we can find s(k) satisfying this recurrence, we’ve found
t
t(k) 6k.
We’re assuming that T(k+ 1 )/T(k) is a rational function of k. Therefore,
by (5.120) and (5.11g), r(k)s(k)/p(k) =
T(k)/(T(k
+ 1) -T(k)) is a rational
function of k, and s(k) itself must be a quotient of polynomials:
s(k)
=
f(k)/g(kl.
(5.122)
But in fact we can prove that s(k) is itself a polynomial. For if g(k) #
1,
and if f(k) and g(k) have no common factors, let N be the largest integer
such that (k + 6) and (k +
l3
+ N
-
1) both occur as factors of g(k) for some
complex number
@.
The value of N is positive, since N = 1 always satisfies
this condition. Equation (5.121) can be rewritten
p(k)g(k+l)g(k)
=
q(k)f(k+l)g(k)
-r(k)g(k+l)f(k),
and if we set k =
-
fi
and k =
-6
-
N we get
r(-B)g(l-B)f(-6)
=
0
=
q(-B-N)f(l-B-N)g(-B-N)
226 BINOMIAL COEFFICIENTS
Now
f(-b)
# 0 and
f(l
-
6
-N) # 0, because f and g have no common
roots. Also
g(1
-
l3)
# 0 and g(-(3
-
N) # 0, because g(k) would otherwise
contain the factor
(k+
fi
-
1)
or (k+
(3
+N), contrary to the maximality of N.
Therefore
T--f')
=
q(-8-N)
=
0.
But this contradicts condition (5.118). Hence s(k) must be a polynomial.
The remaining task is to decide whether there exists a polynomial s(k)
satisfying (5.121), when p(k), q(k), and r(k) are given polynomials. It’s easy
to decide this for polynomials of any particular degree d, since we can write
s(k) =
cXdkd
+
(xdp,
kdm~’
-1-
*.
. +
olo
,
Kd
#
0
for unknown coefficients
(&d,
. . . ,
o(o)
and plug this expression into the defin-
ing equation. The polynomial s(k) will satisfy the recurrence if and only if
the a’s satisfy certain linear equations, because each power of k must have
the same coefficient on both sides of (5.121).
But how can we determine the degree of s? It turns out that there
actually are at most two possibilities. We can rewrite (5.121) in the form
&(k)
=
Q(k)(s(k+
1)
+s(k))
+
R(k)(s(k+
1) -s(k)),
where Q(k) = q(k) -r(k) and R(k) = q(k)
+r(k).
(5.123)
If s(k) has degree d, then the sum s(k + 1)
+
s(k) = 2adkd + . . . also has
degree d, while the difference s(k + 1)
-
s(k) = As(k) = dadkd-’ + . . . has
degree d
-
1. (The zero polynomial can be assumed to have degree -1.) Let’s
write deg(p) for the degree of a polynomial p. If deg(Q) 3 deg(R), then
the degree of the right-hand side of (5.128) is deg(Q) + d, so we must have
d = deg(p)
-
deg(Q). On the other hand if deg(Q)
e:
deg(R) = d’, we can
write Q(k) =
@kd’-’
f.
. . and R(k) =
ykd’
+.
. . where y # 0; the right-hand
side of (5.123) has the form
(2,-?%
+ yd ,d)kd+d’-’ + . . . .
Ergo, two possibilities: Either 28 + yd # 0, and d = deg(p)
-
deg(R) + 1;
or 28 + yd = 0, and d > deg(p)
-
deg(R) + 1. The second case needs to be
examined only if
-2B/y
is an integer d greater than deg(p)
-
deg(R) + 1.
Thus we have enough facts to decide if a suitable polynomial s(k) exists.
If so, we can plug it into (5.120) and we have our T. If not, we’ve proved that
t
t(k) 6k is not a hypergeometric term.
5.7 PARTIAL HYPERGEOMETRIC SUMS 227
Time for an example. Let’s try the partial sum (5.114); Gosper’s method
should be able to deduce the value of
for any fixed n. Ignoring factors that don’t involve k, we want the sum of
The first step is to put the term ratio into the required form (5.117); we have
t(k+
1)
(k-n)
P(k+
1)
q(k)
~
=
t(k)
~
=
(k+
1)
p(k)r(k+
1)
Why isn’t
it
r(k) = k + 1 ?
Oh,
I
see.
so we simply take p(k) =
1,
q(k) = k
-
n, and r(k) = k. This choice of
p,
q,
and
r
satisfies (5.118), unless n is a negative integer; let’s suppose it
isn’t. According to
(5.1~3)~
we should consider the polynomials Q(k) = -n
and R(k) = 2k
-
n. Since R has larger degree than Q, we need to look at
two cases. Either d = deg(p)
-
deg(R) +
1,
which is 0; or d =
-26/y
where
(3
= -n and y = 2, hence d = n. The first case is nicer, so let’s try it first:
Equation (5.121) is
1 =
(k-n)cxc-k%
and so we choose
0~0
= -l/n. This satisfies the required conditions and gives
CT(k) =
r(k)
s(k)
t(k)
p(k)
-,(li
n
~
.-.
k
(-l)k
n
0
n-l
=(
>
k-,
(-W’
9
which is the answer we were hoping to confirm.
If we apply the same method to find the indefinite sum
1
(z) 6k, without
the (-1
)k,
everything will be almost the same except that q(k) will be n
-
k;
hence Q(k) = n
-
2k will have greater degree than R(k) = n, and we will
conclude that d has the impossible value
deg(p)’
-
deg(Q) = -1. Therefore
the function (c) is not summable in hypergeometric terms.
However, once we have eliminated the impossible, whatever
remains-
however improbable-must be the truth (according to S. Holmes
[70]).
When
we defined p, q, and
r
we decided to ignore the possibility that n might be a
228 BINOMIAL COEFFICIENTS
negative integer. What if it is? Let’s set n = -N, where N is positive. Then
the term ratio for
x
(z) 6k is
t(k+
1) -(k+N)
p&S
‘I q(k)
___
zz
t(k)
(k+l) =
~
p(k)
r(k+
‘I
and it should be represented by p(k) = (k+
l)Npl,
q(k)
=
-1, r(k)
=
1.
Gosper’s method now tells us to look for a polynomial s(k) of degree d = N -1;
maybe there’s hope after all. For example, when N = 2 we want to solve
k+ 1 =
-((k+
l)cxl
+
LXO)
-
(km, +
Q)
.
Equating coefficients of k and 1 tells us that
1 = -a1
-
oL1;
1 = -cc~-cx~-cQ;
hence s(k) =
-ik
-
i
is a solution, and
CT(k) =
l+;k-$(,2)
k+l
Can this be the desired sum? Yes, it checks out:
= (-l)k(k+l) = i2 .
(
>
We can write the summation formula in another form,
=
(-‘y-l
y
.
11
This representation conceals the fact that ( ,‘) is summable in hypergeometric
terms, because [m/21 is not a hypergeometric term.
A catalog of summable hypergeometric terms makes a useful addition
to the database of hypergeometric sums mentioned earlier in this chapter.
Let’s try to compile a list of the sums-in-hypergeometric-terms that we know.
The geometric series
x
zk
6k is a very special case, which can be written
tzk6k=(z-l))‘zk+Cor
~F(l;‘+)*,k
=
-&F(l;‘l~k+C.
(5.124)
“Excellent,
Holmes!”
“Elementary, my
dear Wa hon.
5.7 PARTIAL HYPERGEOMETRIC SUMS 229
We also computed
1
kzk
6k
in Chapter 2. This summand is zero when
k = 0, so we get a more suitable hypergeometric term by considering the sum
1
(k
+
1 )zk 6k instead.
Th
e
appropriate formula turns out to be
(5.125)
in hypergeometric notation.
There’s also the formula
1
(k)
6k
=
(,:,),
equation (5.10); we write it
I(
k+;+l)
&k
= (“‘,;t’) ,
to avoid division by zero, and get
,‘6k
=
&F(n+;‘l(‘)k,
n
#
-1.
(5.126)
Identity
(5.9)
turns out to be equivalent to this, when we express it
hyperge-
ometrically.
In general if we have a summation formula of the form
al, . . . .
a,,
1
1)
z
kbk =
CF
AI,
. . . . AM,
1
h,
. . . .
b,
'5,
. . . ,
BN
k’
(5.127)
then we also have
al,
. . . .
a,,
1
bl,
. . . .
bn
k+l
for any integer
1.
There’s a general formula for shifting the index by
1:
al,
.
.
.
,
am
i i
F
=
bl,
. . . .
b,
a, . . . a,
z1
F
al
fl,
. . . ,
a,+4
1
k+l
b;
. . .
b,
1!
bl+1,
. . . ,
b,+l,l+l
k’
1)
Hence any given identity (5.127) has an infinite number of shifted forms:
a1
+1, . . . ,
a,+4
1
bltl,
. . . .
b,+l
1)
z
6k
k
bi
=c”
..bT,
Ai...AT,
F
A1+1,
. .
..AM+~.
1
a\ . . . a,
i
B:.
. . BL
Blfl,
. . . . BN+~
I>
k’
(5.128)
There’s usually a fair amount of cancellation among the a’s, A’s, b’s, and
B’s here. For example, if we apply this shift formula to
(5.126),
we get the
general identity
k6k =
sF(n+;';'lll)k,
(5.129)
230 BINOMIAL COEFFICIENTS
valid for all n # -1. The shifted version of (5.125) is
-1 L+l/(l-2)
F
ZZ---
l-z
1+1
(5.130)
With a bit of patience, we can compute a few more indefinite summation
identities that are potentially useful:
a,
2+(1-a)z/(l-z),
1
l+(l-a)z/(l-z),2
a, b,
c+l,
(c-ab)/(c-a-b+l),
2
c+l,
a+b-c+l
=
(c)(c-b-a)
(c
-
a)(c
-
b)
F
(,,“dI;l,j
‘)k.
(5.133)
Exercises
Warmups
What is 1 l4 ? Why is this number easy to compute, for a person who
knows binomial coefficients?
For which value(s) of k is
(i)
a maximum, when n is a given positive
integer? Prove your answer.
Prove the hexagon property,
(;I:)
(k:,)
(nk+‘)
= (“i’) (i,‘:) (,“,).
Evaluate (-,‘) by negating (actually un-negating) its upper index.
Let p be prime. Show that
(F)
mod p = 0 for 0 < k < p. What does this
imply about the binomial coefficients (“i’)?
Fix up the text’s derivation in Problem 6, Section 5.2, by correctly ap-
A caseof
plying symmetry.
mistaken identity.
Is (5.34) true also when k < O?
5 EXERCISES 231
8
Evaluate
xk
(L)(-l)k(l
-k/n)“. What is the approximate value of this
sum, when n is very large? Hint: This sum is
An
f (0) for some function f.
9
Show that the generalized exponentials of (5.58) obey the law
&t(z)
=
&(tz)
,
1/t
if t # 0,
where
E(z)
is an abbreviation for
&I(Z).
10 Show that
-2(ln(l
-2)
+ z)/
z2
is a hypergeometric function.
11 Express the two functions
23
25 2'
sin2 =
z--+--rlt
3!
5!
.
arcsinz
1.23
1.3.25 1.3.5.27=
2
+
23
+
2.4.5
+ 2.4.6.7
+"'
in terms of hypergeometric series.
12
Which of the following functions of k is a “hypergeometric term,” in the
sense of
(5.115)?
Explain why or why not.
a
nk.
b
kn.
(Here t and T
aren’t necessar-
ily related as in
~w9~J
c
(k! + (k+
1)!)/2.
d
Hk,
that is, f +
t
+.
. . + t.
e t(k)T(n
-
k)/T(n), when t and T are hypergeometric terms.
f (t(k) + T(k))/2, when t and T are hypergeometric terms.
g
(at(k) + bt(k+l) +
ct(k+2))/(a
+ bt(1) + ct(2)), when t is a
hypergeometric term.
Basics
13 Find relations between the superfactorial function P, =
nl,
k! of ex-
ercise 4.55, the hyperfactorial function
Q,,
= nL=,
kk,
and the product
Rn
=
I-I;==,
(;>.
14
Prove identity (5.25) by negating the upper index in Vandermonde’s con-
volution (5.22). Then show that another negation yields (5.26).
15 What is
tk
(L)"(-l)"? Hint: See (5.29).
16 Evaluate the sum
c
(o:Uk)
(b:bk)
(c:k)(-li*
when a, b, c are nonnegative integers.
17 Find a simple relation between (2n;“2) and (2n;i’2).
232 BINOMIAL COEFFICIENTS
18 Find an alternative form analogous to (5.35) for the product
(;)
(r-y)
(r-y).
19
Show that the generalized binomials of (5.58) obey the law
2&(z)
=
tBp,(-z)-‘.
20 Define a “generalized bloopergeometric series” by the formula
G
al,
. . . ,
am
1)
=
a!.
. ,
at
zk
z
=
bl,
. . . . b,
k>O
b+...b$
k!’
using falling powers
inst,ead
of the rising ones in (5.76). Explain how
G
is
related to F.
21 Show that Euler’s definition of factorials is consistent with the ordinary
definition, by showing that the limit in (5.83) is
1/
((m
-
1) . . . (1)) when
2
= m is a positive integer.
22 Use (5.83) to prove the factorial duplication formula:
x! (x
-
i)!
= (2x)!
(-;)!/22”.
23 What is the value of
F(-n,
1;
;
1 )?
24 Find
tk
(,,,tk) (“$“)4” by using hypergeometric series.
25 Show that
(a1
-
bl)
F
al,
a2,
.
.
.
.
a,
bl+1,
bz, . . . . b,
= alF
al+l,
al,
. . . . a,
bl+l,
b2, . . . . b,
14
--b,F(“d~:~~::;~:“bniL).
Find a similar relation between the hypergeometrics F( al, al,
a3
. . . , a,;
bl,...
,bn;z), F(al +
‘l,az,as
. . . .
a,;bl,...,
b,;z),
and F(al,az + 1,
as.. . , a,;
bl,.
. . ,
b,;z).
26 Express the function G(z) in the formula
F
al, . . . .
a,
bl, . . . . b,
1)
z = 1 + G(z)
as a multiple of a hypergeometric series.
By the way,
(-i)!
=
fi.
5 EXERCISES 233
27 Prove that
F
al,
al+;,
. . . . a,,
a,+;
b,,b,+;
,...,
b,,b,+;,;
(2m-n-1
z)2
>
2a1,...,2am
2b1,...,2b,
28 Prove Euler’s identity
= (,
+-a-bF
(c-a;-blg
by applying
Pfaff’s
reflection law (5.101) twice.
29 Show that confluent hypergeometrics satisfy
e’F(;i-z)
=
F(b;aiz).
30 What hypergeometric series F satisfies zF’(z) + F(z) =
l/(1
-
z)?
31 Show that if f(k) is any function summable in hypergeometric terms,
then f itself is a multiple of a hypergeometric term. In other words, if
x
f(k) 6k = cF(A,, . . .
,AM;
Bl,.
. . ,
BN;
Z)k
+ C, then there exist con-
stants
al,
. . . , a,,
bl,
. . . , b,, and z such that f(k) is a constant times
F(
al,
. . . , a,;
bl , . . . ,
b,;
z)k.
32
Find
t
k2
6k
by Gosper’s method.
33 Use Gosper’s method to find
t
6k/(k2
-
1).
34 Show that a partial hypergeometric sum can
always
be represented as a
limit of ordinary hypergeometrics:
k
= F.o
F
E-C,
bl,
. . . ,
b,
when c is a nonnegative integer. Use this idea to evaluate xkbm
(E)
(-1
)k.
Homework exercises
35 The notation tkG,, (;)2”-” is ambiguous without context. Evaluate it
a
as a sum on k;
b
as a sum on n.
36
Let
pk
be the largest power of the prime p that divides
(“‘z”),
when m
and n are nonnegative integers. Prove that k is the number of carries
that occur when m is added to n in the radix p number system. Hint:
Exercise 4.24 helps here.
234 BINOMIAL COEFFICIENTS
37
Show that an analog of the binomial theorem holds for factorial powers.
That is, prove the identities
for all nonnegative integers n.
38
39
Show that all nonnegative integers n can be represented uniquely in the
formn =
(y)+(:)+(i)
h
w
ere a, b, and c are integers with 0 6 a < b < c.
(This is called the binomial number system.)
Show that if xy = ax -t by then xnyn =
xE=:=,
(‘“;~,~“)
(anbnpkxk +
an-
kbnyk) for all n > 0. Find a similar formula for the more general
product xmyn.
40
Find a closed form for
integers
m,n
3 0.
41
42
Evaluate
tk
(L)k!/(n
+ 1 + k)!
when n is a nonnegative integer.
Find the indefinite sum
2
(( -1
)“/(t))
6x, and use it to compute the sum
xL=,(-l)“/(L)
in closed form.
43
Prove the triple-binomial identity (5.28). Hint: First replace
(iz:)
by
Ej
(m&-j>
(!I’
44
Use identity (5.32) to find closed forms for the double sums
~(-l)“k(i~k)
(3)
(L)
(m’~~~-k)
and
jF,ll)j+k(;)
(l;)
(bk)
(:)/(;x)
,
/
45
46
given integers m 3 a 3 0 and n 3 b 3 0.
Find a closed form for
tks,,
(234-k.
Evaluate the following
s’um
in closed form, when n is a positive integer:
Hint: Generating functions win again.
5 EXERCISES 235
47 The sum
tk
(rkk+s)
(‘“;~~~“)
is a polynomial in
r
and s. Show that it
doesn’t depend on s.
48 The identity
xkGn
(“Lk)2pk = 2n can be combined with
tk30
(“lk)zk
=
l/(1
-
2)
n+’
to yield
tk>n
(“~“)2~” =2”. What is the hypergeometric
form of the latter identity?
49 Use the hypergeometric method to evaluate
50 Prove Pfaff’s reflection law (5.101) by comparing the coefficients of 2” on
both sides of the equation.
51 The derivation of (5.104) shows that
lime+0 F(-m, -2m
-
1 +
e;
-2m +
e;
2)
=
l/
(-z2)
.
In this exercise we will see that slightly different limiting processes lead
to distinctly different answers for the degenerate hypergeometric series
F(
-m,
-2m
-
1; -2m; 2).
a
Show that lim
e+~
F(-m +
e,
-2m
-
1; -2m + 2e; 2) = 0, by using
Pfaff’s reflection law to prove the identity F(a, -2m
-
1; 2a; 2) = 0
for all integers m 3 0.
b What is lim
e+~
F(-m +
E,
-2m
-
1; -2m +
e;
2)?
52 Prove that if N is a nonnegative integer,
br].
N
= a, . . .
l-bl-N,..
. ,
l-b,-N,-N
1-al-N,...,l-am--N
53 If we put b =
-5
and z = 1 in Gauss’s identity (5.110), the left side
reduces to -1 while the right side is
fl.
Why doesn’t this prove that
-1
=+l?
54 Explain how the right-hand side of (5.112) was obtained.
55 If the hypergeometric terms t(k) = F(al , . . . , a,,,;
bl,
. . , ,
b,;
z)k and
T(k) =
F(A,,...
,AM;B~,...,BN;Z)~ satisfy t(k) = c(T(k+ 1) -T(k))
for all k 3 0, show that
z
= Z and m
-
n = M
-
N.
56 Find a general formula for
t
(i3) 6k using Gosper’s method. Show that
(-l)k-’
[y]
[y]
is also a solution.
236 BINOMIAL COEFFICIENTS
57 Use Gosper’s method to find a constant
8
such that
is summable in hypergeometric terms.
58
If m and n are integers with 0 6 m 6 n, let
T
m,n
=
Find a relation between
T,,,n
and T,-1 ,+I, then solve your recurrence
by applying a summation factor.
Exam problems
59
60
61
62
63
Find a closed form for
when m and n are positive integers.
Use Stirling’s approximation
(4.23)
to estimate (“,‘“) when m and n are
both large. What does your formula reduce to when m = n?
Prove that when p is prime, we have
for all nonnegative integers m and n.
Assuming that p is prime and that m and n are positive integers, deter-
mine the value of
(,‘$‘)
mod
p2.
Hint: You may wish to use the following
generalization of Vandermonde’s convolution:
k+k&+k
JI:)(~)-~(~)
=
(r’+r2+i-~+Tm)*
1
2
m
Find a closed form for
given an integer n
>,
0.
238 BINOMIAL COEFFICIENTS
72
Prove that, if m, n, and k are integers and n > 0,
n2k-v(k)
is an integer,
where v(k) is the number of l’s in the binary representation of k.
73
Use the repertoire method to solve the recurrence
X0
= a;
x,
:=
p;
Xn
=
(n-1)(X,-j
+X,-2),
for n
>
1
Hint: Both n! and ni satisfy this recurrence.
74 This problem concerns a deviant version of Pascal’s triangle in which the
sides consist of the numbers
1,
2, 3, 4, . . . instead of all l’s, although the
interior numbers still satisfy the addition formula:
1
i
2 2
I
i
343 :
S’
4 7 7 4
5
v4
l1
5
G,
.
ii0
i/.,
$
lb
.b
If
((t))
denotes the kth number in row n, for 1 < k < n, we have
((T))
=
((t))
= n, and
((L))
= ((“,‘)) + ((:I:)) for 1 < k < n. Express
the quantity
((i))
in closed form.
75
Find a relation between the functions
(n)
=
;
(31;:
1)
S2(n)
=
6
(Sk”,
2)
and the quantities
12”/.3J
and
[2n/31.
76 Solve the following recurrence for
n,
k
3
0:
Q
1;
n,O
=
Qo,k
= [k=Ol;
Q
n,k
= Qn-l,k + Qn-l,k-, +
for n, k
>
0.
5 EXERCISES 239
77 What is the value of
O<k
&
<,,
,&
(kc’)
ifm>l?
.II!rn\
.
78 Assuming that m is a positive integer, find a closed form for
kmodm
(2kf
1) mod
(2m+
1)
79 a What is the greatest common divisor of (:“),
(‘3”)
, . . . , (2tT,)? Hint:
Consider the sum of these n numbers.
b Show that the least common multiple of
(i)
,
(y)
, . . . , (E) is equal
to L(n +
l)/(n
+ 1), where L(n) =
lcm(l,2,.
. .
,n).
80 Prove that (L) < (en/k)k for all integers
k,n
3 0.
81 If 0 <
8
< 1 and 0 6 x 6
1,
and if
1,
m, n are nonnegative integers with
m < n, prove the inequality
(wm~'~(;)(~~;)xk
> 0.
k
Hint: Consider taking the derivative with respect to x.
Bonus problems
82 Prove that Pascal’s triangle has an even more surprising hexagon prop-
erty than the one cited in the text:
@((;I:),
(kg,)’
(n:l,)
=
gcd((“,‘),
(;+‘;),
(k”,))
I
if 0 < k < n. For example, gcd(56,36,210) = gcd(28,120,126) = 2.
83 Prove the amazing identity (5.32) by first showing that it’s true whenever
the right-hand side is zero.
84 Show that the second pair of convolution formulas, (5.61), follows from
the first pair, (5.60). Hint: Differentiate with respect to z.
85 Prove that
~il,m
x
m=l
l<kl<kz<...<k,,,$n
(k:+k:+.;+kL+Z”)
=
(-l)nn!3
-
2n
0
n
(The left side is a sum of 2”
-
1 terms.) Hint: Much more is true
240 BINOMIAL COEFFICIENTS
86 Let al, . . . , a,, be nonnegative integers, and let C(al,. . . , a,,) be the
coefficient of the constant term
2:.
. .zt when the n(n
-
1) factors
are fully expanded into positive and negative powers of the complex vari-
ables
~1,
. . . . z,,.
a
Prove that
C(al
, . . .
, a,) equals the left-hand side of (5.31).
b Prove that if
21,
. . ,
z,,
are distinct complex numbers, then the
polynomial
f(4
=
f
11
s
k=l
l<j<n
j#k
is identically equal to 1.
C
Multiply the original product of n(n
-
1) factors by f (0) and deduce
that
C(al,al,...,a,)
isequalto
C(al
-l,az,...,
a,)+C(al,a2-l,...,a,)
+ . . .
+C(al,a2
,...,
a,-1).
(This recurrence defines multinomial coefficients, so C(al , . . . , a,)
must equal the right-hand side of
(5.31).)
8’7
Let m be a positive integer and let
L
= eni”“. Show that
rg-,(zm)n+’
= (1 +
m)K,(zm)
-m
_
t
(C2i+1zIBl+,,,(~2i+l~~l/m)n+1
osj<mTm+
1)%+l,,(L2j+l~)-l
-
1
(This reduces to (5.74) in the special case m =
1.)
88 Prove that the coefficients
sk
in (5.47) are eqUa1 to
for all k > 1; hence
/ski
<:
l/(k-
1).
5 EXERCISES 241
89 Prove that
(5.19)
has an infinite counterpart,
t
(mlr)Xk?Jm-k =
x
(ir)
(-X)k(X+y)“pk, integer m,
k>m
k>m
if
1x1
<
Iy/
and
Ix/
<
Ix
+
y/.
Differentiate this identity n times with
respect to y and express it in terms of hypergeometrics; what relation do
you get?
90
Problem 1 in Section 5.2 considers
tkaO
(3
/(l)
when
r
and s are integers
with s 3 r 3 0. What is the value of this sum if r and s aren’t integers?
91 Prove Whipple’s identity,
F
ia,
;a+;,
l-a-b-c
l+a-b,
l+a-c
= (1
-z)“F
by showing that both sides satisfy the same differential equation.
92 Prove Clausen’s product identities
F
:+a,
$+b
1 +a+b
What identities result
formulas are equated?
=F(
$,
$+a-b,
i--a+b
l+a+b,
l-a-b
when the coefficients of
2”
on both sides of these
93 Show that the indefinite sum
f(i)+a)
has a (fairly) simple form, given any function f and any constant a.
94 Show that if w =
e2ni/3
we have
k+l&x3n
(k,~m)2WL+2m
=
(n,;In)
integer
n
242 BINOMIAL COEFFICIENTS
Research problems
95 Let q(n) be the smallest odd prime factor of the middle binomial co-
efficient
(t).
According to exercise 36, the odd primes p that do not
divide (‘z) are those for which all digits in n’s radix p representation are
(p
-
1)/2 or less. Computer experiments have shown that q(n) 6 11 for
all n <
101oooo,
except that q(3160) = 13.
a
Isq(n)<ll
foralln>3160?
b Is q(n) = 11 for infinitely many n?
A reward of
$(:)
(“,)
(z)
is offered for a solution to either (a) or (b).
96 Is
(‘,“)
divisible by the square of a prime, for all n > 4?
97 For what values of n is
(F)
E (-1)” (mod
(2n-t
l))?
6
Special Numbers
SOME SEQUENCES of numbers arise so often in mathematics that we rec-
ognize them instantly and give them special names. For example, everybody
who learns arithmetic knows the sequence of square numbers
(1,4,9,16,
. . ).
In Chapter 1 we encountered the triangular numbers
(1,3,6,10,
. . . ); in Chap-
ter 4 we studied the prime numbers
(2,3,5,7,.
.
.);
in Chapter 5 we looked
briefly at the Catalan numbers
(1,2,5,14,
. . .
).
In the present chapter we’ll get to know a few other important sequences.
First on our agenda will be the Stirling numbers {t} and [L] , and the Eulerian
numbers
(i);
these form triangular patterns of coefficients analogous to the
binomial coefficients
(i)
in Pascal’s triangle. Then we’ll take a good look
at the harmonic numbers H,, and the Bernoulli numbers
B,;
these differ
from the other sequences we’ve been studying because they’re fractions, not
integers. Finally, we’ll examine the fascinating Fibonacci numbers F, and
some of their important generalizations.
6.1 STIRLING NUMBERS
We begin with some close relatives of the binomial coefficients, the
Stirling numbers, named after James Stirling (1692-1770). These numbers
come in two flavors, traditionally called by the no-frills names “Stirling num-
bers of the first and second kind!’ Although they have a venerable history
and numerous applications, they still lack a standard notation. We will write
{t} for Stirling numbers of the second kind and [z] for Stirling numbers of
the first kind, because these symbols turn out to be more user-friendly than
the many other notations that people have tried.
Tables 244 and 245 show what
{f;}
and [L] look like when n and k are
small. A problem that involves the numbers “1, 7, 6, 1” is likely to be related
to {E}, and a problem that involves
“6,
11, 6, 1” is likely to be related to
[;I,
just as we assume that a problem involving “1, 4, 6, 4, 1” is likely to be
related to (c); these are the trademark sequences that appear when n = 4.
243
244 SPECIAL NUMBERS
Table 244 Stirling’s triangle for subsets.
q---mnni;)
Cl
(751
Cl
Cl
{aI
13
0
1
1
0 1
2
0 1
1
3
0 1
3
1
4
0 1
7 6
1
5
0 1 15 25 10
1
6
0 1 31 90 65 15
1
7
0 1 63
301
350
140 21 1
8
0 1
127 966
1701
1050
266 28 1
9
0 1 255 3025 7770 6951
2646 462 36
1
Stirling numbers of the second kind show up more often than those of
the other variety, so let’s consider last things first. The symbol {i} stands for
(Stirling
himself
the number of ways to partition a set of n things into k nonempty subsets.
For example, there are seven ways to split a four-element set into two parts:
~~~fi~d~~!
book
[281].)
{1,2,3IuI41,
u,2,4u31,
U,3,4IuI21,
12,3,4uUl,
{1,2IuI3,41,
Il,3ICJ{2,41,
u,4wv,3h
(6.1)
thus {i} = 7. Notice that curly braces are used to denote sets as well as
the numbers {t} . This notational kinship helps us remember the meaning of
CL
which can be read “n subset
k!’
Let’s look at small k. There’s just one way to put n elements into a single
nonempty set; hence
{
‘,‘}
= 1, for all n > 0. On the other hand
{y}
= 0,
because a O-element set is empty.
The case k = 0 is a bit tricky. Things work out best if we agree that
there’s just one way to partition an empty set into zero nonempty parts; hence
{i} = 1. But a nonempty set needs at least one part, so {i} = 0 for n > 0.
What happens when k
==
2?
Certainly {i} = 0. If a set of n > 0 objects
is divided into two nonempty parts, one of those parts contains the last object
and some subset of the first
n
-
1
objects. There are
2+’
ways to choose the
latter subset, since each of the first n
-
1 objects is either in it or out of it;
but we mustn’t put all of those objects in it, because we want to end up with
two
nonempty
parts. Therefore we subtract
1:
n
11
2
=
T-1
-
1
)
integer n > 0.
(6.2)
(This tallies with our enumeration of {i} = 7 =
23
-
1 ways above.)
6.1 STIRLING NUMBERS 245
Table 245 Stirling’s triangle for cycles.
n
0
1
2
c
3
4
5
6
7
8
9
1
0
0
0
0
0
0
0
0
0
1
1
2
6
24
120
720
5040
40320
1
3
1
11
6
1
50
35 10
1
274
225 85 15 1
1764 1624
735
175 21 1
13068 13132 6769 1960
322 28 1
109584 118124 67284 22449 4536 546 36 1
A modification of this argument leads to a recurrence by which we can
compute {L} for all k: Given a set of n > 0 objects to be partitioned into k
nonempty parts, we either put the last object into a class by itself (in
{:I:}
ways), or we put it together with some nonempty subset of the first n
-
1
objects. There are
k{n,‘}
possibilities in the latter case, because each of the
{
“;‘}
ways to distribute the first n
-
1 objects into k nonempty parts gives
k subsets that the nth object can join. Hence
{;1)
=
k{rrk’}+{EI:},
integern>O.
This is the law that generates Table 244; without the factor of k it would
reduce to the addition formula (5.8) that generates Pascal’s triangle.
And now, Stirling numbers of the first kind. These are somewhat like
the others, but
[L]
counts the number of ways to arrange n objects into k
cycles instead of subsets. We verbalize ‘[;I’ by saying “n cycle k!’
Cycles are cyclic arrangements, like the necklaces we considered in Chap-
ter 4. The cycle
can be written more compactly as ‘[A, B, C, D]‘, with the understanding that
[A,B,C,D] = [B,C,D,A] = [C,D,A,Bl =
[D,A,B,Cl;
a cycle “wraps around” because its end is joined to its beginning. On the other
hand, the cycle [A, B, C,
D]
is not the same as [A, B, D,
C]
or [D, C, B, A].
246 SPECIAL NUMBERS
There are eleven different ways to make two cycles from four elements:
“There are nine
and sixty ways
[1,2,31
[41,
[’
,a41
Dl
,
[1,3,41
PI
,
[&3,4 [II,
of constructing
[1,3,21
[41,
[’
,4,21
Dl
,
P,4,31
PI
,
P,4,31
PI,
tribal lays,
And-every-single-
P,21
[3,41,
[’
,31
P,
4 ,
[I,41
P,31;
one-of-them-is-
(W
rjght,”
hence
[“;I
= 11.
-Rudyard Kipling
A singleton cycle (that is, a cycle with only one element) is essentially
the same as a singleton set (a set with only one element). Similarly, a 2-cycle
is like a 2-set, because we have [A,
B]
=
[B,
A] just as {A, B} = {B, A}. But
there are two diflerent 3-cycles, [A, B,
C]
and [A, C,
B].
Notice, for example,
that the eleven cycle pairs in (6.4) can be obtained from the seven set pairs
in (6.1) by making two cycles from each of the 3-element sets.
In general,
n!/n
= (n -- 1) ! cycles can be made from any n-element set,
whenever n > 0. (There are n! permutations, and each cycle corresponds
to n of them because any one of its elements can be listed first.) Therefore
we have
n
[I
1
= (n-l)!, integer n > 0.
This is much larger than the value {;} = 1 we had for Stirling subset numbers.
In fact, it is easy to see that the cycle numbers must be at least as large as
the subset numbers,
[E]
3
{L}y
integers n, k 3 0,
because every partition into nonempty subsets leads to at least one arrange-
ment of cycles.
Equality holds in (6.6) when all the cycles are necessarily singletons or
doubletons, because cycles are equivalent to subsets in such cases. This hap-
pens when k = n and when k = n
-
1; hence
[Z] =
{iI}’
[nl:l]
=
{nil}
In fact, it is easy to see that.
[“n]
=
{II}
=
[nil]
=
{nnl}
= (I)
(6.7)
(The number of ways to arrange n objects into n
-
1 cycles or subsets is
the number of ways to choose the two objects that will be in the same cycle
or subset.) The triangular numbers (;) = 1, 3, 6, 10, . . . are conspicuously
present in both Table 244 and Table 245.
6.1 STIRLING NUMBERS 247
We can derive a recurrence for
[z]
by modifying the argument we used
for {L}. Every arrangement of n objects in k cycles either puts the last object
into a cycle by itself (in
[:::I
wa
s
or inserts that object into one of the [“;‘Iy )
cycle arrangements of the first n- 1 objects. In the latter case, there are n- 1
different ways to do the insertion. (This takes some thought, but it’s not hard
to verify that there are j ways to put a new element into a j-cycle in order to
make a (j + 1)-cycle. When j = 3, for example, the cycle [A, B,
C]
leads to
[A,
B,
C,
Dl
,
[A,B,D,Cl,
or
[A,D,B,Cl
when we insert a new element D, and there are no other possibilities. Sum-
ming over all j gives a total of n- 1 ways to insert an nth object into a cycle
decomposition of n
-
1 objects.) The desired recurrence is therefore
n
[I
k
=
(n-l)[ni’]
+
[:I:],
integern>O.
This is the addition-formula analog that generates Table 245.
Comparison of (6.8) and (6.3) shows that the first term on the right side is
multiplied by its upper index (n- 1) in the case of Stirling cycle numbers, but
by its lower index k in the case of Stirling subset numbers. We can therefore
perform “absorption” in terms like n[z] and k{
T},
when we do proofs by
mathematical induction.
Every permutation is equivalent to a set of cycles. For example, consider
the permutation that takes 123456789 into 384729156. We can conveniently
represent it in two rows,
123456789
384729156,
showing that 1 goes to 3 and 2 goes to 8, etc. The cycle structure comes
about because 1 goes to 3, which goes to 4, which goes to 7, which goes back
to 1; that’s the cycle
[1,3,4,7].
Another cycle in this permutation is [2,8,5];
still another is
[6,91.
Therefore the permutation 384729156 is equivalent to
the cycle arrangement
[1,3,4,7l
L&8,51
691.
If we have any permutation
rr1
rrz
. . . rr, of
{
1,2,.
. . , n}, every element is in a
unique cycle. For if we start with mu = m and look at
ml
=
rrmor
ml
= rrm,,
etc., we must eventually come back to mk =
TQ.
(The numbers must re-
peat sooner or later, and the first number to reappear must be mc because
we know the unique predecessors of the other numbers ml,
ml,
. . . ,
m-1
.)
Therefore every permutation defines a cycle arrangement. Conversely, every
248 SPECIAL NUMBERS
cycle arrangement obviously defines a permutation if we reverse the construc-
tion, and this one-to-one correspondence shows that permutations and cycle
arrangements are essentially the same thing.
Therefore
[L]
is the number of permutations of n objects that contain
exactly k cycles. If we sum
[z]
over all k, we must get the total number of
permutations:
= n!, integer n 3 0.
(6.9)
For example, 6 + 11 + 6 + 1 = 24 = 4!.
Stirling numbers are useful because the recurrence relations (6.3) and
(6.8) arise in a variety of problems. For example, if we want to represent
ordinary powers
x”
by falling powers xc, we find that the first few cases are
X0
=
x0;
X1
zz
x1;
X2
zz
x2.+&
x3
=
x3+3&+,1;
X4
=
x4+6x3+7xL+x1,
These coefficients look suspiciously like the numbers in Table 244, reflected
between left and right; therefore we can be pretty confident that the general
formula is
Xk,
integer n 3 0.
(6.10)
We’d better define
{C}
=
[;I
=
0
when k < 0 and
And sure enough, a simple proof by induction clinches the argument: We n 3
O.
have
x.
xk
= xk+l + kxk,
bec:ause
xk+l =
xk
(x
-
k)
;
hence
x.
xnP1
is
x${~;‘}x”
=
;,i”;‘}x”+;{“;‘}kx”
=
;,{;I;}x”‘Fj”;‘}kx”
=
;,(k{“;‘}
+
{;;;;})xh
=
6
{;}xh.
In other words, Stirling subset numbers are the coefficients of factorial powers
that yield ordinary powers.
6.1 STIRLING NUMBERS 249
We can go the other way too, because Stirling cycle numbers are the
coefficients of ordinary powers that yield factorial powers:
xiT
= xo.
Xi =
xiI
xi
= x2 + x’ ;
x”
-
x3
+3x2 +2x';
x"
:
x4
+6x3 +11x* +6x'.
We have (x+n-
l).xk
=xk+’
+ (n
-
1 )xk, so a proof like the one just given
shows that
-
(xfn-1)~~~’
=
(x+n-1);
ril]xk
=
F
[r;]xk.
This leads to a proof by induction of the general formula
integer n 3 0.
(6.11)
(Setting x = 1 gives (6.9) again.)
But wait, you say. This equation involves rising factorial powers
xK,
while
(6.10) involves falling factorials xc. What if we want to express
xn
in terms of
ordinary powers, or if we want to express
X”
in terms of rising powers? Easy;
we just throw in some minus signs and get
integer n > 0; (6.12)
(6.13)
This works because, for example, the formula
x4
=
x(x-1)(x-2)(x-3)
= x4-6x3+11x2-6x
is just like the formula
XT =
~(~+1)(~+2)(x+3)
= x4+6x3+11x2+6x
but with alternating signs. The general identity
x3
=
(-ly-#
(6.14)
of exercise 2.17 converts (6.10) to (6.12) and (6.11) to (6.13) if we negate x.
250 SPECIAL NUMBERS
Table 250 Basic Stirling number identities, for integer n > 0.
Recurrences:
{L}
=
kjnk’}+{;I:}.
n
[I
k
=
(n-
Special values:
{I}
=
[i]
1
=
q [n =
01
.
n
{I
1
=
[n>Ol;
n
{I
2
=
(2np'
-1)
[n>O];
{nnl}
=
[n”l]
=’
(1)’
{;} = [j =
(;)
= 1.
{;} =
[;I
=
(3
=
0,
=
(n-l)![n>O].
n
[I
2
=
(n-l)!H,-1
[n>O]
if k > n.
Converting between powers:
X
ii
=T-
L
k
Inversion formulas:
n
k
1
Xk
.
(-l)“pk
II
[m=n];
6.1 STIRLING NUMBERS 251
Table 251 Additional Stirling number identities, for integers 1, m, n 3 0.
{Z}
=
$(i){k}.
[zl]
=
G
[J(k).
{;}
=
;
(;){;‘t:J(w”.
[;I
=
&
[;I;]
($J-k.
m!{z}
=
G
(3kn(-1)--k.
{:I:}
=
&{L)(-+llnek.
[;;:I
=
f.[#~
=
&[j/k!.
{m+;+‘}
=
g
k{n;k}.
[m+:“]
=
g(n+k)[nlk].
(;)
=
F(nk++:}[$-li'"*.
In-m)!(Jh3ml
=
F
[;+':]{k}(-l)mek.
{n:m}
=
$
(ZZ)
(:I;)
[“:“I
*
[n:m]
=
~(Z~L)(ZI;)(-:“}
{lJm}(L:m)
=
G{F}{“m”)(L)
(6.15)
(6.16)
(6.17)
(6.18)
(6.19)
(6.20)
(6.21)
(6.22)
(6.23)
(6.24)
(6.25)
(6.26)
(6.27)
(6.28)
(6.29)
252 SPECIAL NUMBERS
We can remember when to stick the
(-l)“pk
factor into a formula like
(6.12) because there’s a natural ordering of powers when x is large:
Xii >
xn
>
x5,
for all x > n > 1.
(6.30)
The Stirling numbers
[t]
and {z} are nonnegative, so we have to use minus
signs when expanding a “small” power in terms of “large” ones.
We can plug (6.11) into (6.12) and get a double sum:
This holds for all x, so the coefficients of x0, x1, . . . , xnp’, x”+‘, xn+‘, . . on
the right must all be zero and we must have the identity
;
0
N
(-l)“pk
==
[m=n],
integers
m,n
3 0.
Stirling numbers, like b.inomial coefficients, satisfy many surprising iden-
tities. But these identities aren’t as versatile as the ones we had in Chapter 5,
so they aren’t applied nearly as often. Therefore it’s best for us just to list
the simplest ones, for future reference when a tough Stirling nut needs to be
cracked. Tables 250 and 251 contain the formulas that are most frequently
useful; the principal identities we have already derived are repeated there.
When we studied binomial coefficients in Chapter 5, we found that it
was advantageous to define
1::)
for negative n in such a way that the identity
(;) =
(“,‘)
+
(;I:)
.
IS
valid without any restrictions. Using that identity to
extend the (z)‘s beyond those with combinatorial significance, we discovered
(in Table 164) that Pascal’s triangle essentially reproduces itself in a rotated
form when we extend it upward. Let’s try the same thing with Stirling’s
triangles: What happens if we decide that the basic recurrences
{;}
=
k{n;‘}+{;I:}
n
[I
k
=
(n-I)[“;‘]
+
[;I:]
are valid for all integers n and k? The solution becomes unique if we make
the reasonable additional stipulations that
{E}
=
[J
= [k=Ol and
{t}
=
[z]
=
[n=O].
(6.32)
6.1 STIRLING NUMBERS 253
Table 253 Stirling’s triangles in tandem.
n
{:5}
{_nq}
{:3}
{:2}
{:1}
{i}
{Y}
{I}
{3}
{a}
{r}
-5
1
-4 10 1
-3 35 6 1
-2 50 11 3 1
-1 24 6 2 1
1
0
000001
1
0 0 0 0 0 0 1
2
0 0 0 0 0 0 11
3
0 0 0 0 0 0 13 1
4
0 0 0 0 0 0 17 6 1
5
0 0 0 0 0 0 115 25 10 1
In fact, a surprisingly pretty pattern emerges:
Stirling’s triangle for cycles
appears above Stirling’s triangle for subsets, and vice versa! The two kinds
of Stirling numbers are related by an extremely simple law:
[I]
=
{I:},
integers k,n.
We have “duality,” something like the relations between min and max, between
1x1
and
[xl,
between
XL
and
xK,
between gcd and lcm. It’s easy to check that
both of the recurrences
[J
= (n- 1) [“;‘I +
[i;:]
and
{i}
=
k{n;‘}
+
{:I:}
amount to the same thing, under this correspondence.
6.2
EULERIAN
NUMBERS
Another triangle of values pops up now and again, this one due to
Euler
[88,
page
4851,
and we denote its elements by (E). The angle brackets
in this case suggest “less than” and “greater than” signs; (E) is the number of
permutations
rr1
rr2
. . .
rr,
of
{l
,2, . . . ,
n} that have k ascents, namely, k places
where
Xj
< nj+l. (Caution: This notation is even less standard than our
no-
tations
[t]
,
{i}
for Stirling numbers. But we’ll see that it makes good sense.)
For example, eleven permutations of
{l
,2,3,4}
have two ascents:
1324, 1423, 2314, 2413, 3412;
1243, 1342, 2341;
2134, 3124, 4123.
(The first row lists the permutations with
~1
<
7~2
>
7r3
<
7~;
the second row
lists those with
rrl
<
~2
<
7~3
>
7~4
and
~1
>
rr2
<
713
<
7r4.)
Hence
(42)
= 11.
254 SPECIAL NUMBERS
Table 254 Euler’s triangle.
n
0
1
2
c
3
4
5
6
7
8
9
1
10
11
0
14
1
0
1
11 11
1
0
1
26
66. 26
1
0
1
57 302 302 57
1
0
1 120
1191
2416
1191
120
1
0
1 247 4293 15619
15619 4293
247 1 0
1 502 14608 88234 156190 88234 14608
502 1 0
Table 254 lists the smallest Eulerian numbers; notice that the trademark
sequence is
1,
11, 11, 1
this time. There can be at most n
-
1
ascents, when
n > 0, so we have
(:)
=
[n=:O]
on the diagonal of the triangle.
Euler’s triangle, like Pascal’s, is symmetric between left and right. But
in this case the symmetry law is slightly different:
(3
=
(,-Y-k),
integer n> 0;
(6.34)
The permutation
rrr
7~2
. . .
71,
has n- 1 -k ascents if and only if its “reflection”
7rn
*
. .
7r27rl
has k ascents.
Let’s try to find a recurrence for
(i).
Each permutation p =
p1
. . .
pnpl
of{l,...
,n
-
1)
leads to n permutations of
{1,2,.
. .
,n}
if we insert the new
element n in all possible ways. Suppose we put n in position j, obtaining the
permutation
71
=
pi
. . .
pi-1
11
Pj
. . .
~~-1.
The number of ascents in
rr
is the
same as the number in
p,
if j = 1 or if
pi-1
<
pi;
it’s one greater than the
number in p, if
pi-1
>
oj
or if j = n. Therefore
rr
has k ascents in a total of
(kf
l)(n,‘)
Y
f
wa
s
rom permutations
p
that have k ascents, plus a total of
((n-2)-P-1)+1)(:X;)
ways from permutations
p
that have k- 1 ascents.
The desired recurrence is
(3
=
[k+lJ(n,l>+[n-k](LI:>.
integern>O.
(6.35)
Once again we start the recurrence off by setting
0
0
k
= [k=O], integer k,
(6.36)
and we will assume that
(L)
= 0 when k < 0.
6.2 EULERIAN NUMBERS 255
Eulerian numbers are useful primarily because they provide an unusual
connection between ordinary powers and consecutive binomial coefficients:
xn
=
F(L)(“L”>,
integern>O.
(This is “Worpitzky’s identity”
[308].)
For example, we have
x2
-
-
x3
=
(1)+(T))
(;)+qy)+(y’),
(;)+ll(x;')+11(Xfi2)+(X;3),
and so on.
It’s easy to prove (6.37) by induction (exercise 14).
Incidentally, (6.37) gives us yet another way to obtain the sum of the
first n squares: We have
k2
=
($(i)
+
(f)
(“i’) = (i) + (ki’), hence
12+22+...+n2
=
((;)+(;)+-.+(;))+((;)+(;)+.-+(";'))
=
("p)
+
("f2)
=
;(
n+l)n((n-l)+(n+2)).
The Eulerian recurrence (6.35) is a bit more complicated than the Stirling
recurrences (6.3) and
(6.8),
so we don’t expect the numbers
(L)
to satisfy as
many simple identities. Still, there are a few:
(t)
=
g
(n:‘)(m+l
-k)“(-llk;
-!{Z}
=
G(E)(n*m)’
(;)
=
$
{;}(“,“)(-l)nPk-mk!
(6.38)
(6.39)
(6.40)
If we multiply
(6.39)
by
znPm
and sum on m, we get
x,,
{
t}m! zn-“’ =
tk
(c)
(z
+ 1)
k.
Replacing
z
by
z
-
1 and equating coefficients of
zk
gives
(6.40). Thus the last two of these identities are essentially equivalent. The
first identity,
(6.38),
gives us special values when m is small:
(i)
= 1;
(I)
=
2n-n-l;
(1)
=
3”-(n+l)Z”+(n:‘) .
256 SPECIAL NUMBERS
Table 256 Second-order Eulerian triangle.
{'l
'I
L\‘,
0
1
/
2
3
4
5
6
7
8
1
10 1
_
J .f\
1 2
0
'1
i
I
!
18
6 0
1
/
i'
i
1 22
58 24
0
:
\I
'
1 52
328 444 120
0
1 114
1452 4400 3708 720
0
1 240
5610 32120 58140
33984 5040
0
1 494
19950 195800 644020 785304 341136 40320
0
We needn’t dwell further on
Eulerian
numbers here; it’s usually sufficient
simply to know that they exist, and to have a list of basic identities to fall
back on when the need arises. However, before we leave this topic, we should
take note of yet another triangular pattern of coefficients, shown in Table 256.
We call these “second-order Eulerian numbers”
((F)),
because they satisfy a
recurrence similar to (6.35) but with n replaced by 2n
-
1 in one place:
((E))
=
(k+l)((n~1))+(2n-l-k)((~-:)>.
(6.41)
These numbers have a curious combinatorial interpretation, first noticed by
Gessel and Stanley
[118]:
If we form permutations of the multiset
(1,
1,2,2,
. ,n,n} with the special property that all numbers between the two occur-
rences of m are greater than m, for 1 6 m 6 n, then ((t)) is the number of
such permutations that have k ascents. For example, there are eight suitable
single-ascent permutations of
{l
,
1,2,2,3,3}:
113322,
133221, 221331, 221133,
223311, 233211, 331122, 331221.
Thus
((T))
= 8. The multiset
{l,
1,2,2,.
.
. n}
, n,
has a total of
=
(2n-1)(2n-3)...(l)
=
y
(6.42)
suitable permutations, because the two appearances of n must be adjacent
and there are 2n
-
1 places to insert them within a permutation for n
-
1.
For example, when n = 3 the permutation 1221 has five insertion points,
yielding
331221, 133221, 123321, 122331,
and
122133.
Recurrence (6.41) can
be proved by extending the argument we used for ordinary Eulerian numbers.
6.2
EULERIAN
NUMBERS 257
Second-order Eulerian numbers are important chiefly because of their
connection with Stirling numbers
[119]:
We have, by induction on n,
{x"n}
=
&(($~+n~lyk).
integern30;
(6.43)
[x”n]
=
g(~))(“~k)
7
For example,
{Zl}
=
(1))
integer n 3 0.
(6.44)
[x:1]
=
(1);
{z2}
=
(7’)
+2(g)
[x:2]
=
(:)+2(y);
(,r,}
=
(“:‘)
+8(y)
+6(d)
[xx3]
=
(1)
+8(x;‘)
+6(x12).
(We already encountered the case n = 1 in (6.7).) These identities hold
whenever x is an integer and n is a nonnegative integer. Since the right-hand
sides are polynomials in x, we can use (6.43) and (6.44) to define Stirling
numbers
{
.“,}
and [,Tn] for arbitrary real (or complex) values of x.
If n > 0, these polynomials
{
.“,} and
[,“J
are zero when x = 0, x =
1,
. . . ,
and x = n; therefore they are divisible by (x-O), (x-l), . . . , and (x-n).
It’s interesting to look at what’s left after these known factors are divided out.
We define the Stirling polynomials o,(x) by the rule
&l(x)
=
[
1
.",
/(X(X-l)...(X-TX)).
(6.45)
(The degree of o,(x) is n
-
1.)
The first few cases are
So
l/x
isa
polynomial?
(Sorry
about
that.)
q)(x)
= l/x;
CT,(x)
=
l/2;
02(x)
=
(3x-1)/24;
q(x) =
(x2
-x)/48;
Q(X)
=
(15x3
-30x2+5x+2)/5760.
They can be computed via the second-order Eulerian numbers; for example,
CQ(X)
=
((~-4)(x-5)+8(x+1)(x-4)
+6(x+2)(x+1))/&
258 SPECIAL NUMBERS
Table 258 Stirline convolution formulas.
rs
f
Ok(T)
0,-k(S) = (r + s)on(r +
s)
k=O
(6.46)
S
f
k&(T) (T&(s) =
no,(r+
S)
(6.47)
k=O
rS&(l.+k)On&S-4-k)
= (?'+S)D,(l-+S+n)
k=O
n
SE
kCTk(T+k)G,~k(Si-
n-k) =
no,(r+S+n)
(6.48)
(6a)
k=O
=
(-l)""-l(mn!,),~"-,(m)
(6.50)
n
[
I
I
m
=
imT
,
)!
k,(n)
(6.51)
It turns out that these polynomials satisfy two very pretty identities:
zez
(
)
-
=
XpJ,(X)2Q
ez
-
1
TX>0
(iln&---
=
x&o,(x+n)zn;
/
(6.52)
(6.53)
Therefore we can obtain general convolution formulas for Stirling numbers, as
we did for binomial coefficients in Table 202; the results appear in Table 258.
When a sum of Stirling numbers doesn’t fit the identities of Table 250 or 251,
Table 258 may be just the ticket. (An example appears later in this chapter,
following equation (6.100).
Elxercise
7.19 discusses the general principles of
convolutions based on identit:ies like (6.52) and (6.53).)
6.3 HARMONIC NUMBERS
It’s time now to take a closer look at harmonic numbers, which we
first met back in Chapter 2:
H,
=
,+;+;+...+;
=
f;,
integer n 3 0.
k=l
(6.54)
These numbers appear so often in the analysis of algorithms that computer
scientists need a special notation for them. We use
H,,
the ‘H’ standing for
6.3 HARMONIC NUMBERS 259
This must be
Table 259.
“harmonic,” since a tone of wavelength l/n is called the nth harmonic of a
tone whose wavelength is 1. The first few values look like this:
n101234
5 6 7 8 9 10
Exercise 21 shows that
H,
is never an integer when n > 1.
Here’s a card trick, based on an idea by R. T. Sharp
[264],
that illustrates
how the harmonic numbers arise naturally in simple situations. Given n cards
and a table, we’d like to create the largest possible overhang by stacking the
cards up over the table’s edge, subject to the laws of gravity:
To define the problem a bit more, we require the edges of the cards to be
parallel to the edge of the table; otherwise we could increase the overhang by
rotating the cards so that their corners stick out a little farther. And to make
the answer simpler, we assume that each card is 2 units long.
With one card, we get maximum overhang when its center of gravity is
just above the edge of the table. The center of gravity is in the middle of the
card, so we can create half a cardlength, or 1 unit, of overhang.
With two cards, it’s not hard to convince ourselves that we get maximum
overhang when the center of gravity of the top card is just above the edge
of the second card, and the center of gravity of both cards combined is just
above the edge of the table. The joint center of gravity of two cards will be
in the middle of their common part, so we are able to achieve an additional
half unit of overhang.
This pattern suggests a general method, where we place cards so that the
center of gravity of the top k cards lies just above the edge of the
k-t
1st card
(which supports those top k). The table plays the role of the
n+
1st card. To
express this condition algebraically, we can let
dk
be the distance from the
extreme edge of the top card to the corresponding edge of the kth card from
the top. Then
dl
= 0, and we want to make dk+, the center of gravity of the
first k cards:
&+l =
(4
+l)+(dz+l)+...+(dk+l),
for1
<k<n
k
\
, .
(6,55)
260 SPECIAL NUMBERS
(The center of gravity of k objects, having respective weights
WI,
. . . ,
wk
and having
reSpeCtiVe
Centers Of
gravity
at
pOSitiOnS
~1,
. . . pk, is at position
(WPl
+.
.
+
WkPk)/bl
+
. +
wk).)
We can rewrite this recurrence in two
equivalent forms
k&+1
= k +
dl
+ . . . +
dkp1
+
dk
,
k 3 0;
(k-l)dk
= k-l
+dl
+...+dk-1, k>
1.
Subtracting these equations tells us that
kdk+l
-(k-l)dk
=
1
+dk,
k>
1;
hence dk+l = dk + l/k. The second card will be offset half a unit past the
third, which is a third of a unit past the fourth, and so on. The general
formula
&+I
=
Hk
(6.56)
follows by induction, and if we set k = n we get dn+l =
H,
as the total
overhang when n cards are stacked as described.
Could we achieve greater overhang by holding back, not pushing each
card to an extreme position but storing up “potential gravitational energy”
for a later advance? No; any well-balanced card placement has
&+I 6
(l+dl)+(l-td~)+...+(l+dk)
k
,
1
<k<n.
Furthermore dl = 0. It follows by induction that dk+l < Hk.
Notice that it doesn’t take too many cards for the top one to be com-
pletely past the edge of the table. We need an overhang of more than one
cardlength, which is 2 units. The first harmonic number to exceed 2 is
HJ =
g,
so we need only four cards.
And with 52 cards we have an H52-unit overhang, which turns out to be
H52/2 x 2.27 cardlengths. (We will soon learn a formula that tells us how to
compute an approximate value of
H,
for large n without adding up a whole
bunch of fractions.)
An amusing problem called the “worm on the rubber band” shows har-
monic numbers in another guise. A slow but persistent worm, W, starts at
one end of a meter-long rubber band and crawls one centimeter per minute
toward the other end. At the end of each minute, an equally persistent keeper
of the band, K, whose sole purpose in life is to frustrate W, stretches it one
meter. Thus after one minute of crawling, W is 1 centimeter from the start
and 99 from the finish; then K stretches it one meter. During the stretching
operation W maintains his relative position, 1% from the start and 99% from
Anyone who actu-
ally tries to achieve
this maximum
overhang with 52
cards is probably
not dealing with
a full deck-or
maybe he’s a real
joker.
6.3 HARMONIC NUMBERS 261
Metric units make
this problem more
scientific.
the finish; so W is now 2 cm from the starting point and 198 cm from the
goal. After W crawls for another minute the score is 3 cm traveled and 197
to go; but K stretches, and the distances become 4.5 and 295.5. And so on.
Does the worm ever reach the finish? He keeps moving, but the goal seems to
move away even faster. (We’re assuming an infinite longevity for K and W,
an infinite elasticity of the band, and an infinitely tiny worm.)
Let’s write down some formulas. When K stretches the rubber band, the
fraction of it that W has crawled stays the same. Thus he crawls l/lOOth of
it the first minute, 1/200th the second, 1/300th the third, and so on. After
n minutes the fraction of the band that he’s crawled is
1
1
H,
-
=
100 (
1+!+1+
1
2
3
"'+n
)
100'
(6.57)
A flatworm, eh?
So he reaches the finish if
H,
ever surpasses 100.
We’ll see how to estimate
H,
for large
‘n
soon; for now, let’s simply
check our analysis by considering how “Superworm” would perform in the
same situation. Superworm, unlike W, can crawl 50cm per minute; so she
will crawl
HJ2
of the band length after n minutes, according to the argument
we just gave. If our reasoning is correct, Superworm should finish before n
reaches 4, since
H4
> 2. And yes, a simple calculation shows that Superworm
has only 335 cm left to travel after three minutes have elapsed. She finishes
in 3 minutes and 40 seconds flat.
Harmonic numbers appear also in Stirling’s triangle. Let’s try to find a
closed form for
[‘J
, the number of permutations of n objects that have exactly
two cycles. Recurrence (6.8) tells us that
[“:‘I
=
$1
+
[y]
=4 1
i
+(n-l)!,
ifn>O;
and this recurrence is a natural candidate for the summation factor technique
of Chapter 2:
1
n-t1
[
2
1
1
n
2
=-
(n-l)!
[I
2
+;.
Unfolding this recurrence tells us that
5
[nl’] =
H,;
hence
n+l
1
1
2
=
n!H,
(6.58)
We proved in Chapter 2 that the harmonic series
tk
1 /k diverges, which
means that
H,
gets arbitrarily large as n -+ 00. But our proof was indirect;
262 SPECIAL NUMBERS
we found that a certain infinite sum (2.58) gave different answers when it was
rearranged, hence ,Fk l/k could not be bounded. The fact that
H,
+
00
seems counter-intuitive, because it implies among other things that a large
enough stack of cards will overhang a table by a mile or more, and that the
worm W will eventually reach the end of his rope. Let us therefore take a
closer look at the size of
H,
when n is large.
The simplest way to see that
H,
+
M
is probably to group its terms
according to powers of 2. We put one term into group 1, two terms into
group 2, four into group 3, eight into group 4, and so on:
1
+
1+1+
;+;+;:+;
+
~~‘~1~~~~~1~~~~
+
.
.
&-\I
8 9 10 11 12 13 14 15
group 1 group 2 group 3
group 4
Both terms in group 2 are between
$
and 5, so the sum of that group is
between 2.
a
=
4
and
2.
i
= 1. All four terms in group 3 are between
f
and
f,
so their sum is also between
5
and 1. In fact, each of the 2k-’ terms
in group k is between 22k and 21ek; hence the sum of each individual group
is between
4
and 1.
This grouping procedure tells us that if n is in group k, we must have
H,
> k/2 and
H,
6 k (by induction on k). Thus
H,
+
co,
and in fact
LlgnJ
+
1
2
<
H,
S
LlgnJ
+l
We now know
H,
within a factor of 2. Although the harmonic numbers
approach infinity, they approach it only logarithmically-that is, quite slowly.
We should call them
Better bounds can be found with just a little more work and a dose
the
worm
numbers~
of calculus. We learned in Chapter 2 that
H,
is the discrete analog of the
they’re so slow.
continuous function Inn. The natural logarithm is defined as the area under
a curve, so a geometric comparison is suggested:
f(x)
t
f(x) = l/x
<
0 1 2 3 . . . n
nfl
x
The area under the curve between 1 and n, which is Jy dx/x = Inn, is less
than the area of the n rectangles, which is
xF=:=,
l/k = H,. Thus Inn <
H,;
this is a sharper result than we had in (6.59). And by placing the rectangles
6.3 HARMONIC NUMBERS 263
‘7
now see a way
too
how
ye
aggre-
gate of
ye
termes
of
Musical1 pro-
gressions
may bee
found (much after
ye same manner)
by
Logarithms,
but
y”
calculations for
finding out those
rules would bee still
more troublesom.”
-1.
Newton [223]
a little differently, we get a similar upper bound:
*
0 1 2 3 . . . n
X
This time the area of the n rectangles, H,,
is less than the area of the first
rectangle plus the area under the curve. We have proved that
Inn <
H,
< lnn+l, for n > 1.
(6.60)
We now know the value of
H,
with an error of at most 1.
“Second order” harmonic numbers Hi2) arise when we sum the squares
of the reciprocals, instead of summing simply the reciprocals:
n
1
Hf’
=
,+;+;+...+$
=
x2.
k=l
Similarly, we define harmonic numbers of order r by summing (--r)th powers:
Ht)
=
f-&
k=l
(6.61)
If r >
1,
these numbers approach a limit as n
--t
00; we noted in Chapter 4
that this limit is conventionally called Riemann’s zeta function:
(Jr) =
HE
=
t
;.
k>l
(6.62)
Euler discovered a neat way to use generalized harmonic numbers to
approximate the ordinary ones,
Hf
).
Let’s consider the infinite series
(6.63)
which converges when k > 1. The left-hand side is Ink
-
ln(k
-
1); therefore
if we sum both sides for 2 6 k 6 n the left-hand sum telescopes and we get
= (H,-1) +
;(HP’-1)
+ $(Hc’-1) +
;(H:)-1)
+ ... .
264 SPECIAL NUMBERS
Rearranging, we have an expression for the difference between
H,
and Inn:
H,-Inn
= 1-
i(HF’
-1)
_
f
(j-$/%1)
-
$-$‘-1)
-
. . .
When n -+ 00, the right-hand side approaches the limiting value
1
-;&(2)-l)
-3&(3).-l)
-
$(LV-1)
-...
>
which is now known as Euler’s constant and conventionally denoted by the
Greek letter y. In fact, L(r)
-
1 is approximately
l/2’,
so this infinite series
“Huius
igitur
quan-
converges rather rapidly and we can compute the decimal value
titatis constantis
C
valorem
detex-
y = 0.5772156649. . . .
(6.64)
imus,
quippe est
C =
0,577218."
Euler’s argument establishes the limiting relation
lim (H,
-Inn)
=
y;
n-CC
(6.65)
thus
H,
lies about 58% of the way between the two extremes in (6.60). We
are gradually homing in on its value.
Further refinements are possible, as we will see in Chapter 9. We will
prove, for example, that
1
En
H,
=
lnn+y+&--
-
1 2n2
+ 120n4
O<cn<l.
This formula allows us to conclude that the millionth harmonic
HIOOOOOO
=
14.3927267228657236313811275,
(6.66)
number is
without adding up a million fractions. Among other things, this implies that
a stack of a million cards can overhang the edge of a table by more than seven
cardlengths.
What does (6.66) tell us about the worm on the rubber band? Since
H,
is
unbounded, the worm will definitely reach the end, when
H,
first exceeds
100.
Our approximation to
H,
says that this will happen when n is approximately
In fact, exercise 9.49 proves that the critical value of n is either [e’oo-‘J or
Well, they can ‘t
really go at it this
Te
‘oo~~~l.
We can imagine W’s triumph when he crosses the finish line at last,
long;
the world will
much to K’s chagrin, some 287 decillion centuries after his long crawl began.
have ended much
(The rubber band will have stretched to more than
102’
light years long; its
earlier, when the
Tower of Brahma is
molecules will be pretty far apart.)
fully transferred.
6.4 HARMONIC SUMMATION 265
6.4 HARMONIC SUMMATION
Now let’s look at some sums involving harmonic numbers, starting
with a review of a few ideas we learned in Chapter 2. We proved in (2.36)
and (2.57) that
t
Hk =
O<k<n
x
kHk =
O<k<n
Let’s be bold and
nH,
-n;
(6.67)
n(n-
llH
n(n- 1)
2
lx-
4.
(6.68)
take on a more general sum, which includes both of these
as special cases: What is the value of
when m is a nonnegative integer?
The approach that worked best for (6.67) and (6.68) in Chapter 2 was
called summation by parts. We wrote the summand in the form u(k)Av(k),
and we applied the general identity
~;u(x)Av(x)
Sx
=
u(x)v(x)(L
-
x:x(x
+ l)Au(x) 6x.
(6.69)
Remember? The sum that faces us now, xoSkcn (k)Hk, is a natural for this
method because we can let
u(k)
=
Hk,
Au(k) =
Hk+l
-
Hk =
&
;
Av(k) =
(In other words, harmonic numbers have a simple A and binomial coefficients
have a simple A-‘, so we’re in business.) Plugging into (6.69) yields
The remaining sum is easy, since we can absorb the (k + 1
)-’
using our old
standby, equation (5.5):
266 SPECIAL NUMBERS
Thus we have the answer we seek:
(&I,
OHk
=
(ml
1)
(Hn-
$7).
(6.70)
(This checks nicely with (6.67) and (6.68) when m = 0 and m = 1.)
The next example sum uses division instead of multiplication: Let us try
to evaluate
s,
=
f;.
k=l
If we expand Hk by its definition, we obtain a double sum,
Now another method from
us that
C:hapter
2 comes to our aid;
eqUatiOn
(2.33)
tdlS
Sn
=
k(($J2+g$)
=
~(H;+H?)).
(6.71)
It turns out that we could also have obtained this answer in another way if
we had tried to sum by parts (see exercise 26).
Now let’s try our hands at a more difficult problem
[291],
which doesn’t
submit to summation by parts:
integer n > 1
(This sum doesn’t explicitly mention harmonic numbers either; but who
(Not to give the
knows when they might turn up?)
answer away or
We will solve this problem in two ways, one by grinding out the answer
anything.)
and the other by being clever and/or lucky. First, the grinder’s approach. We
expand (n
-
k)” by the binomial theorem, so that the troublesome k in the
denominator will combine with the numerator:
u,
=
x
;
q
t
(;)
(-k)jnn-j
k>l
0
i
(-l)i-lTln-j
x
(El)
(-l)kk’P’
.
k>l
This isn’t quite the mess it seems, because the
kj-’
in the inner sum is a
polynomial in k, and identity (5.40) tells us that we are simply taking the
6.4 HARMONIC SUMMATION
2ci7
nth difference of this polynomial. Almost; first we must clean up a few things.
For one, kim’ isn’t a polynomial if j = 0; so we will need to split off that term
and handle it separately. For another, we’re missing the term k = 0 from the
formula for nth difference; that term is nonzero when j =
1,
so we had better
restore it (and subtract it out again). The result is
un
=
t
i>l
0
y
(-1)'
‘nnPix
(E)(-l)kki
k?O
OK, now the top line (the only remaining double sum) is zero: It’s the sum
of multiples of nth differences of polynomials of degree less than n, and such
nth differences are zero. The second line is zero except when j = 1, when it
equals
-nn.
So the third line is the only residual difficulty; we have reduced
the original problem to a much simpler sum:
(6.72)
For example,
Ll3
=
(:)$
-
(i)
5
=
F;
T3
= (:)
f
-
(:)
5
+
(:)i
= $$ hence
Ll3
=
27(T3
~ 1) as claimed.
How can we evaluate T,? One way is to replace
(F)
by
(“i’)
+
(:I:),
obtaining a simple recurrence for
T,,
in terms of T,
1.
But there’s a more
instructive way: We had a similar formula in (5.41), namely
n!
___
=
x(x+ l)...(x +
n)
If we subtract out the term for k = 0 and set x = 0, we get
-Tn.
So let’s do it:
I
x(x+
1)::.
(x+n)
X=o
=
(x+l)...(x+n)-n!
(
x(x+l)...(x+n)
)I
x=0
x”[~~~]
+...+x[“t’]
+ [n:‘] -n!=
(
x(x +
l)...
(x+
n)
>
Ii0
=
;[“:‘I
268 SPECIAL NUMBERS
(We have used the expansion (6.11) of (x + 1) . . . (x + n) =
xn+‘/x;
we can
divide x out of the numerator because
[nt’]
= n!.) But we know from (6.58)
that
[nt’]
= n! H,; hence
T,,
= H,, and we have the answer:
Ll, =
n”(H,-1).
(6.73)
That’s one approach. The other approach will be to try to evaluate a
much more general sum,
U,(x,y) =
xG)‘g(~+ky)~,
integern30;
(6.74)
k>l
the value of the original Ll, will drop out as the special case U,(n, -1). (We
are encouraged to try for more generality because the previous derivation
“threw away” most of the details of the given problem; somehow those details
must be irrelevant, because the nth difference wiped them away.)
We could replay the previous derivation with small changes and discover
the value of U,(x,y). Or we could replace (x +
ky)”
by (x + ky)+‘(x + ky)
and then replace
(i)
by (“i’) +
(:I:),
leading to the recurrence
U,(x,y) =
xLLl(x,yj
+xn/n+yxn-’
;
(6.75)
this can readily be solved with a summation factor (exercise 5).
But it’s easiest to use another trick that worked to our advantage in
Chapter 2: differentiation. The derivative of
U,
(x, y ) with respect to y brings
out a k that cancels with the k in the denominator, and the resulting sum is
trivial:
$.l,(x,
y) =
t
(1)
(-l)kP’n(x + ky)+’
k>l
n
=
0
0
nx”-’
-
(-l)kn(x
+
ky)nP’
=
nxnP’
.
(Once again, the nth difference of a polynomial of degree < n has vanished.)
We’ve proved that the derivative of U,(x, y) with respect to y is nxnP’,
independent of y. In general, if f’(y) = c then f(y) = f(0) + cy; therefore we
must have U,(x,y) = &(x,0) + nxnP’y.
The remaining task is to determine
U,
(x,
0). But U,(x, 0) is just
xn
times the sum
Tn
=
H,
we’ve already considered in (6.72); therefore the
general sum in (6.74) has the closed form
Un(x, y) = xnHn +
nxnP’
y .
(6.76)
In particular, the solution to the original problem is
U,
(n,
-1) = nn(Hn
-
1).
6.5 BERNOULLI NUMBERS 269
6.5
BERNOULLI NUMBERS
The next important sequence of numbers on our agenda is named
after Jakob Bernoulli
(1654-1705),
who discovered curious relationships while
working out the formulas for sums of mth powers
[22].
Let’s write
n-1
S,(n) =
Om+lm+...+(n-l)m
=
x
km
=
x;xmsx.
(6.77)
k=O
(Thus, when m > 0 we have S,(n) =
Hi::)
in the notation of generalized
harmonic numbers.) Bernoulli looked at the following sequence of formulas
and spotted a pattern:
So(n) = n
12
S,(n) = ?n
-
in
Sz(n)
= in3
-
in2 +
in
S3(n)
= in4
-
in3
+ in2
S4(n)
=
in5
-
in4 + in3
-
&n
S5(n)
=
in6
-
$5
+ fin4
-
+pz
!j6(n)
= +n’
-
in6
+
in5
-
in3
+
An
ST(n)
= in8
-
in’
+ An6
-
&n” + An2
19
&J(n)
=
Vn
-
in8
+
$n'-
&n5+
$n3-
$p
ST(n)
=
&n’O
-
in9
+
$n8-
$n6+
$4-
&n2
So(n)
=
An
11
-
+lo+
in9-
n7+
n5-
1n3+5n
2 66
Can you see it too? The coefficient of
nm+’
in S,(n) is always 1 /(m + 1).
The coefficient of
nm
is always
-l/2.
The coefficient of
nmP’
is always . . .
let’s see . . .
m/12. The coefficient of
nmP2
is always zero. The coefficient
of nmP3 is always . . . let’s see . . . hmmm . . . yes, it’s
-m(m-l)(m-2)/720.
The coefficient of
nmP4
is always zero. And it looks as if the pattern will
continue, with the coefficient of
nmPk
always being some constant times
mk.
That was Bernoulli’s discovery. In modern notation we write the coeffi-
cients in the form
S,(n)
=
&(Bcnmil
+
(m:l)B~nm+...+
(m~‘)Bmn)
=
&g
(mk+‘)BkTlm+l-k.
k=O
(6.78)
270 SPECIAL NUMBERS
Bernoulli numbers are defined by an implicit recurrence relation,
B’
= [m==O], for all m 3 0.
For example, (i)Bo + (:)B’ = 0. The first few values turn out to be
(All conjectures about a simple closed form for
B,
are wiped out by the
appearance of the strange fraction
-691/2730.)
We can prove Bernoulli’s formula (6.78) by induction on m, using the
perturbation method (one of the ways we found Sz(n) =
El,
in Chapter 2):
n-.1
S
,,,+I
(n) + nm+’ =
1
(k + l)m+’
k=O
=
g
z
(m:l)k’
=
g
(m:l)Sj(n).
(6.80)
Let
S,(n)
be the right-hand side of (6.78); we wish to show that S,,,(n) =
S,(n),
assuming that
Sj
(n) =
Sj
(n) for 0 < j < m. We begin as we did for
m = 2 in Chapter 2, subtracting
S,,,+’
(n) from both sides of (6.80). Then we
expand each
Sj
(n) using (6.78), and regroup so that the coefficients of powers
of n on the right-hand side are brought together and simplified:
nm+’
=
f
(m+l)Sj(,i
=
g
(mT1)5j(Tl) +
(“z’)
A
j=O
=
~(m~')~~~(jk')Bknj+l~'+~m+l)b
=
o~~~~(m~l)(i~l)~n’i’~~k+(m+l)A
.
.,
=
o~~~,,(m~l)(~~~)~nk+l
+(m+l)A
,
,,
6.5 BERNOULLI NUMBERS 271
Here’s some more
neat stuff that
you’ll probably
want to skim
through the first
time.
-Friend/y TA
I
Start
Skimming
=
o~,~(m~l)k~,(~~~k)Bj--r+(m+l)A
=
o~m~(m~l)o~~~i(m~~~k)~~+~~+~~A
. .
[m-k=Ol+(m+l)A
=
nm”
+ (m+
l)A,
where A = S,,,(n)
-g,(n).
(This derivation is a good review of the standard manipulations we learned
in Chapter 5.) Thus A = 0 and S,,,(n) =
S,(n),
QED.
In Chapter 7 we’ll use generating functions to obtain a much simpler
proof of (6.78). The key idea will be to show that the Bernoulli numbers are
the coefficients of the power series
(6.81)
Let’s simply assume for now that equation (6.81) holds, so that we can de-
rive some of its amazing consequences. If we add ;Z to both sides, thereby
cancelling the term
Blz/l!
=
-;z
from the right, we get
zeZ+l
z
eLi2
+
ecL12
-L+;
= -- =
-
=-
2
eL-1
2
p/2
-
e-z/2
z
coth z
2 2’
(6.82)
Here coth is the “hyperbolic cotangent” function, otherwise known in calculus
books as
cash
z/sinh z; we have
sinhz =
ez
-
e-2
eL
+
ecz
-;
2
coshz =
~
2
Changing
z
to --z gives
(7)
coth(
y)
=
f
coth 5; hence every odd-numbered
coefficient of
5
coth
i
must be zero, and we have
B3
= Bs = B, =
B9
= B,, = B,3 =
...
= 0.
(6.84)
Furthermore (6.82) leads to a closed form for the coefficients of
coth:
zcothz =
-&+;
=
xB2,s
II>0
=
UP,,,&,
. (6.85)
nk0
But there isn’t much of a market for hyperbolic functions; people are more
interested in the “real” functions of trigonometry. We can express ordinary
272 SPECIAL NUMBERS
trigonometric functions in terms of their hyperbolic cousins by using the rules
sin z = -isinh
iz
,
cos z =
cash
iz;
the corresponding power series are
sin2
2’
23
25
2’
25
=
1!-3!+5!--...
23
,
sinhz =
T+“j-i.+5r+...;
20
22
24
cosz
=
o!-2!+4?--...)
.ci
.;
zi
coshz =
ol+2r+T+...
.
. . .
Hence cot
z
= cos z/sin
z
=
i
cash
iz/
sinh
iz
= i
coth
iz, and we have
(6.86)
I see, we get “real”
functions
by
using
imaginary numbers.
(6.87)
Another remarkable formula for zcot z was found by Euler (exercise 73):
zcotz =
l-2tTg.
k>,krr
-z2
(6.88)
We can expand Euler’s formula in powers of
z2,
obtaining
.
Equating coefficients of
zZn
with those in our other formula,
(6.87),
gives us
an almost miraculous closed form for infinitely many infinite sums:
<(In)
=
H($)
=
(-l)np'
22n-1
n2nf3
2n
(2n)!
integer n >
0.
(6.89)
For example,
c(2)
=
HE)
=
1
+
;
+
;
+.
.
.
=
n2B2
=
x2/6;
(6.90)
((4) = Hk) = 1 +
&
+
&
+.
. . =
-ff
B4/3
=
d/90.
(6.91)
Formula (6.89) is not only a closed form for
HE),
it also tells us the approx-
imate size of
Bzn,
since H,,
(ln)
is very near 1 when n is large. And it tells
US that
(-l)n-l
B2,,
> 0 for all n > 0; thus the
nonzero
Bernoulli numbers
alternate in sign.
6.5 BERNOULLI NUMBERS 273
And that’s not all. Bernoulli numbers also appear in the coefficients of
the tangent function,
(6.92)
as well as other trigonometric functions (exercise 70). Formula
(6.92)
leads
to another important fact about the Bernoulli numbers, namely that
T2n-,
= (-1)-l
4n(4n-l)
2n
Bzn
is a positive integer.
(Wi)
We have, for example:
n
135
7 9
11
13
Tll
1 2 16
272 7936 353792
22368256
(The T's are called tangent numbers.)
One way to prove
(6.g3),
following an idea of B. F. Logan, is to consider
the power series
sinz+xcosz
-
x+
(l+x2)z+ (2x3+2x); + (6x4+8x2+2); +
cosz-xsinz
-
When x = tanw, where T,,(x) is a polynomial in x; setting x = 0 gives
T,
(0) =
Tn,
the nth
this is tan( z + w) .
tangent number. If we differentiate
(6.94)
with respect to x, we get
1
(cosz-xsinz)2
=
xT(x)$;
Tl>O
but if we differentiate with respect to
z,
we get
1+x2
(cosz-xsin~)~
=
tT,(xl&
=
tT,_M$.
ll>l tl)O
(Try it-the cancellation is very pretty.) Therefore we have
-&,+1(x)
=
(1
+x2)T;(x),
To(x)
=
x,
(fhd
a simple recurrence from which it follows that the coefficients of
Tn(x)
are
nonnegative integers. Moreover, we can easily prove that
Tn(x)
has degree
n + 1, and that its coefficients are alternately zero and positive. Therefore
Tz,+I
(0) =
Tin+,
is a positive integer, as claimed in
(6.93).
274 SPECIAL NUMBERS
Recurrence (6.95) gives us a simple way to calculate Bernoulli numbers,
via tangent numbers, using only simple operations on integers; by contrast,
the defining recurrence (6.79) involves difficult arithmetic with fractions.
If we want to compute the sum of nth powers from a to b
-
1 instead of
from 0 to n
-
1, the theory of Chapter 2 tells us that
b-l
x
k”’
=
x;
xm6x = S,(b) -S,,,(a).
(6.96)
k=a
This identity has interesting consequences when we consider negative values
of k: We have
i
km
=
(-1)-F
km,
when m > 0,
k=--n+l
k=:O
hence
S,(O)
-
S,(-n+
1)
=:
(-l)m(Sm(n)
-S,(O)).
But S,(O) = 0, so we have the identity
S,(l
-n) =
(-l)“+‘S,(n),
m > 0.
(6.97)
Therefore
S,(
1) = 0. If we write the polynomial S,(n) in factored form, it
will always have the factors n and (n- 1 ), because it has the roots 0 and
1.
In
general, S,(n) is a polynomial of degree m + 1 with leading term
&n”‘+’
.
Moreover, we can set n = i in (6.97) to get
S,(i)
=
(-l)“+‘S,(~);
if m is
even, this makes
S,(i)
= 0, so (n
-
5) will be an additional factor. These
observations explain why we found the simple factorization
Sl(n) =
in(n
-
t)(n
-
1)
in Chapter 2; we could have used such reasoning to deduce the value of Sl(n)
without calculating it! Furthermore, (6.97) implies that the polynomial with
the remaining factors, S,(n) =
S,(n)/(n
-
i),
always satisfies
S,(l
-n) = S,(n), m even, m > 0.
It follows that S,(n) can always be written in the factored form
I
A
‘E’
(n
-
;
-
ak)(n
_
;
+
Kk)
,
m odd;
S,(n)
=
k=l
(6.98)
6.5 BERNOULLI NUMBERS 275
Here
01’
=
i,
and
0~2,
. . . ,
CX~,,,/~I
are appropriate complex numbers whose
values depend on m. For example,
Ss(n) =
n2(n-
1)2/4;
&t(n)
=
n(n-t)(n-l)(n-
t +
m)(n
-
t
-
fl)/5;
Ss(n) =
n’(n-l)‘(n-
i
+
m)(n-
i
-
m)/6;
Ss(n) =
n(n-$)(n-l)(n-i
+
(x)(n-5
-
Ix)(n--t
+E)(n-t
--I%),
where 01=
2~5i23~‘/231’i4(~~+
i
dm).
If m is odd and greater than 1, we have
B,
= 0; hence S,,,(n) is divisible
by
n2
(and by (n
-
1)‘). Otherwise the roots of S,(n) don’t seem to obey a
simple
law.
Let’s conclude our study of Bernoulli numbers by looking at how they
relate to Stirling numbers. One way to compute S,(n) is to change ordinary
powers to falling powers, since the falling powers have easy sums. After doing
those easy sums we can convert back to ordinary powers:
n-’
S,(n)
=
x
k
m
k=O
=
7
7
{;}l&
=
x{y}z
kj
k=O j?O
j>O
k=O
t-11
j+l-
k
i
+
1
nk
[
1
k
Therefore, equating coefficients with those in (6.78), we must have the identity
;{;}[i:‘](-jy;-*
=
--&(mk+l)Brn+i,.
(6.99)
It would be nice to prove this relation directly, thereby discovering Bernoulli
numbers in a new way. But the identities in Tables 250 or 251 don’t give
us any obvious handle on a proof by induction that the left-hand sum in
(6.99) is a constant times
rnc.
If k = m +
1,
the left-hand sum is just
{R}
[EI;]/(m+l)
=
l/(m+l
I,
so that case is easy. And if k = m, the
left-
handsidesumsto~~~,~[~]m~~-~~~[“‘~~](m+1~~~
=$(m-l)-im=-i;
so that case is pretty easy too. But if k < m, the left-hand sum looks hairy.
Bernoulli would probably not have discovered his numbers if he had taken
this route.
The back-to-nature
nature of this ex-
ample is shocking.
This book should be
banned.
Phyllotaxis, n.
The love of taxis.
6.6 FIBONACCI NUMBERS 277
Unlike the harmonic numbers and the Bernoulli numbers, the Fibonacci num-
bers are nice simple integers. They are defined by the recurrence
F0
= 0;
F,
= 1;
F,
=
F,-I
+F,-2,
for n > 1.
(6.102)
The simplicity of this rule-the simplest possible recurrence in which each
number depends on the previous two-accounts for the fact that Fibonacci
numbers occur in a wide variety of situations.
“Bee trees” provide a good example of how Fibonacci numbers can arise
naturally. Let’s consider the pedigree of a male bee. Each male (also known
as a drone) is produced asexually from a female (also known as a queen); each
female, however, has two parents, a male and a female. Here are the first few
levels of the tree:
The drone has one grandfather and one grandmother; he has one great-
grandfather and two great-grandmothers; he has two great-great-grandfathers
and three great-great-grandmothers. In general, it is easy to see by induction
that he has exactly
Fn+l
greatn-grandpas and F,+z greatn-grandmas.
Fibonacci numbers are often found in nature, perhaps for reasons similar
to the bee-tree law. For example, a typical sunflower has a large head that
contains spirals of tightly packed florets, usually with 34 winding in one di-
rection and 55 in another. Smaller heads will have 21 and 34, or 13 and 21;
a gigantic sunflower with 89 and 144 spirals was once exhibited in England.
Similar patterns are found in some species of pine cones.
And here’s an example of a different nature
[219]:
Suppose we put two
panes of glass back-to-back. How many ways a,, are there for light rays to
pass through or be reflected after changing direction n times? The first few
278 SPECIAL NUMBERS
cases are:
a0 =
1
al
=2 az=3
a3
=5
When n is even, we have an even number of bounces and the ray passes
through; when n is odd, the ray is reflected and it re-emerges on the same
side it entered. The
a,‘s
seem to be Fibonacci numbers, and a little staring
at the figure tells us why: For n 3 2, the n-bounce rays either take their
first bounce off the opposite surface and continue in
a,-1
ways, or they begin
by bouncing off the middle surface and then bouncing back again to finish
in
a,-2
ways. Thus we have the Fibonacci recurrence a,, =
a,-1
+
a,-2.
The initial conditions are different, but not very different, because we have
a0
= 1 =
F2
and al = 2
==
F3;
therefore everything is simply shifted two
places, and a,, = F,+z.
Leonardo Fibonacci introduced these numbers in 1202, and mathemati-
cians gradually began to discover more and more interesting things about
them.
l%douard
Lucas, the perpetrator of the Tower of Hanoi puzzle dis-
cussed in Chapter 1, worked with them extensively in the last half of the
nine- “La suite de Fi-
teenth century (in fact it was Lucas who popularized the name “Fibonacci
bonacciPoss~de
numbers”). One of his amazing results was to use properties of Fibonacci
des propri&b
numbers to prove that the 39-digit Mersenne number
212’
-
1 is prime.
nombreuses
fort
inikkessantes.”
One of the oldest theorems about Fibonacci numbers, due to the French
astronomer Jean-Dominique Cassini in 1680
[45],
is the identity
-E. Lucas
[207]
F
,,+,F+,
-F; =
(-l).",
for n > 0.
(6.103)
When n = 6, for example, Cassini’s identity correctly claims that 1
3.5-tS2
=
1.
A polynomial formula that involves Fibonacci numbers of the form
F,,+k
for small values of k can be transformed into a formula that involves only
F,
and
F,+I
,
because we can use the rule
Fm
=
F,+2
-
F,+I
(6.104)
to express
F,
in terms of higher Fibonacci numbers when m < n, and we can
use
F,
=
F,~z+F,~,
(6.105)
to replace
F,
by lower Fibonacci numbers when m > n-t1 . Thus, for example,
we can replace
F,-I
by
F,+I
-
F,
in (6.103) to get Cassini’s identity in the
6.6 FIBONACCI NUMBERS 279
form
F:,,
-
F,+I
F,-F,f
= (-1)“. (6.106)
Moreover, Cassini’s identity reads
F
n+zFn
-
F,f+, =
(-l)“+’
when n is replaced by n + 1; this is the same as
(F,+I
+
F,)F,
-
F:,,
=
(-l)“+‘,
which is the same as (6.106). Thus Cassini(n) is true if and only if
Cassini(n+l) is true; equation (6.103) holds for all n by induction.
Cassini’s identity is the basis of a geometrical paradox that was one of
Lewis Carroll’s favorite puzzles
[54],
[258],
[298].
The idea is to take a chess-
board and cut it into four pieces as shown here, then to reassemble the pieces
into a rectangle:
Presto: The original area of 8 x 8 = 64 squares has been rearranged to yield
The paradox is
5 x 13 = 65 squares! A similar construction dissects any F, x F, square
explained
be-
cause well,
into four pieces, using
F,+I
,
F,,
F,
1,
and
F,
1
as dimensions wherever the
magic tricks aren’t
illustration has 13, 8, 5, and 3 respectively. The result is an F, 1 x
F,+l
supposed to be
rectangle; by (6.103), one square has therefore been gained or lost, depending
explained.
on whether n is even or odd.
Strictly speaking, we can’t apply the reduction (6.105) unless m > 2,
because we haven’t defined F, for negative n. A lot of maneuvering becomes
easier if we eliminate this boundary condition and use (6.104) and (6.105) to
define Fibonacci numbers with negative indices. For example, F 1 turns out
to be
F1
-
Fo
= 1; then F-
2
is
FO
-F 1 = -1. In this way we deduce the values
nl
0 -1 -2 -3 -4 -5 -6
-7
-8 -9 -10 -11
F, 1 0 1 -1 2 -3 5 -8 13 -21 34 -55 89
and it quickly becomes clear (by induction) that
Fm, =
(-l)nP’F,,
integer n.
(6.107)
Cassini’s identity (6.103) is true for all integers n, not just for n > 0, when
we extend the Fibonacci sequence in this way.
280 SPECIAL NUMBERS
The process of reducing
Fn*k
to a combination of F, and
F,+,
by using
(6.105) and (6.104) leads to the sequence of formulas
F
n+2
=
F,+I
+
F,
Fn-I =
F,+I
-
F,
F
n+3
=
2F,+,
+
F,
Fn-2
=
-F,+,
+2F,
F
n+4
= 3F,+1 + 2F,
Fn-3
=
2F,+,
-SF,
F
n+5
= 5F,+1 + 3F,
Fn-4
=
-3F,+,
+
5F,
in which another pattern becomes obvious:
F
n+k
=
FkFn+l
+
h-IF,,
. (6.108)
This identity, easily proved by induction, holds for all integers k and n (pos-
itive, negative, or zero).
If we set k = n in
(6.108),
we find that
F2n =
FnFn+l
+
Fn-I
Fn
;
(6-g)
hence
Fz,,
is a multiple of F,. Similarly,
F3n = FznFn+tl
+
F2n-1Fn,
and we may conclude that
F:+,,
is also a multiple of F,. By induction,
Fkn
is a multiple of
F,
,
(6.110)
for all integers k and n. This explains, for example, why
F15
(which equals
610) is a multiple of both
F3
and
F5
(which are equal to 2 and 5). Even more
is true, in fact; exercise 27 proves that
.wWm,
Fn)
=
Fgcd(m,n)
.
(6.111)
For example,
gcd(F,Z,F,s)
=
gcd(144,2584)
= 8 =
Fg.
We can now prove a converse of
(6.110):
If n
>
2 and if
F,
is a multiple of
F,,
then m is a multiple of n. For if
F,\F,
then
F,\
gcd(F,, F,) =
Fgcd(m,n)
<
. .
F,. This
1s
possible only if
Fgcd(m,nl
= F,; and our assumption that n > 2
makes it mandatory that gcd(m, n) = n. Hence n\m.
An extension of these divisibility ideas was used by Yuri Matijasevich in
his famous proof
[213]
that there is no algorithm to decide if a given multivari-
ate polynomial equation with integer coefficients has a solution in integers.
Matijasevich’s lemma states that, if n > 2, the Fibonacci number
F,
is a
multiple of
F$
if and only if m is a multiple of nF,.
Let’s prove this by looking at the sequence (Fk, mod
F$)
for k = 1, 2,
3
I
“‘,
and seeing when
Fk,,
mod
Fi
= 0. (We know that m must have the
6.6 FIBONACCI NUMBERS 281
form kn if
F,
mod
F,
= 0.) First we have
F,
mod
Fi
=
F,;
that’s not zero.
Next we have
F2n
=
FnFn+l
+
F,-lF,
=
2F,F,+l
(mod
Fi)
,
by (6.108), since F,+I E
F,-l
(mod F,). Similarly
F2,+1 =
Fz+l +
Fi
E Fi+l (mod F,f).
This congruence allows us to compute
F3n
=
F2,+1
Fn
+
FznFn-I
= Fz+lF, +
(ZF,F,+I)F,+I
= 3Fz+,F,
(mod
Fi)
;
F3n+1
=
F2n+1
Fn+l
+
F2nFn
=
-
F;t+l
+
VFnF,+l
IF,
=
F:+l
(mod
F,f)
.
In general, we find by induction on k that
Fkn
E kF,F,k+; and Fk,,+l E
F,k+,
(mod
F:).
Now
Fn+l
is relatively prime to
F,,
so
Fkn
= 0 (mod Fz)
tl
kF, E 0 (mod
F:)
W
k E 0 (mod F,).
We have proved Matijasevich’s lemma.
One of the most important properties of the Fibonacci numbers is the
special way in which they can be used to represent integers. Let’s write
j>>k
j 3
k+2.
(6.112)
Then every positive integer has a unique representation of the form
n
=
h,
+
Fkz
+
.
.
.
+
Fk,
,
kl
>
kz
>>
. . .
>
k, >> 0.
(6.113)
(This is “Zeckendorf’s theorem”
[201],
[312].) For example, the representation
of one million turns out to be
1~~0000
=
832040
+
121393
+
46368
+
144
+
55
=
F30
+ F26 +
F24
+
FIZ
+Flo.
We can always find such a representation by using a “greedy” approach,
choosing
Fk,
to be the largest Fibonacci number 6 n, then choosing
Fk2
to be the largest that is < n
-
Fk,, and so on. (More precisely, suppose that
282 SPECIAL NUMBERS
Fk
< n <
Fk+l;
then we have 0 6 n
-
Fk
<
Fk+l
--
Fk
=
Fk~
1.
If n is a
Fibonacci number, (6.113) holds with
r
=
1
and kl = k. Otherwise n
-
Fk
has a Fibonacci representation
FkL
+.
+
Fk,-,
by induction on n; and (6.113)
holds if we set kl = k, because the inequalities
FkL
< n
-
Fk
<
Fk
1 imply
that k
>
kz.) Conversely, any representation of the form (6.113) implies that
h,
<
n
<
h,+l
,
because the largest possible value of
FkJ
+ . . . +
Fk,
when k
>>
kz
>>
. . .
>>
k,
>>
0 is
Fk~2$.Fk~4+...+FkmodZf2 =
Fk~m,
-1,
if k 3 2.
(6.114)
(This formula is easy to prove by induction on k; the left-hand side is zero
when k is 2 or 3.) Therefore
k1
is the greedily chosen value described earlier,
and the representation must. be unique.
Any unique system of representation is a number system; therefore
Zeck-
endorf’s theorem leads to the Fibonacci number system. We can represent
any nonnegative integer n as a sequence of O’s and 1 ‘s, writing
n =
(b,b,-1
.
..bl)F
w
n=
bkhc
.
(6.115)
k=2
This number system is something like binary (radix 2) notation, except that
there never are two adjacent
1's.
For example, here are the numbers from 1
to 20, expressed Fibonacci-wise:
1
=
(000001)~ 6
=
(OOIOO1)F
11
=
(010100)~
16 =
(lOOIOO)F
2
=
(000010)~
7
=
(001010)~
12
=
(010101)~
17=
(100101)~
3
=
(000100)~ 8
=
(OIOOOO)F
13
=
(100000)~
18 =
(lOIOOO)F
4
=
(000101)~
9
=
(010001)~ 14
=
(100001)~ 19
=
(101001)~
5
=
(001000)~ 10
=
(010010)~
15
=
(100010)~
20 =
(101010)~
The Fibonacci representation of a million, shown a minute
ago,
can be con-
trasted with its binary representation
219
+
218
+
2”
+
216
+
214
+
29
+
26:
(1000000)10
= (10001010000000000010100000000)~
=
(11110100001001000000)~.
The Fibonacci representation needs a few more bits because adjacent l's are
not permitted; but the two representations are analogous.
To add 1 in the Fibonacci number system, there are two cases: If the
“units digit” is 0, we change it to 1; that adds
F2
= 1, since the units digit
6.6 FIBONACCI NUMBERS 283
5% 1
+
x
+
2xx
+
3x3
+5x4 +8x5
+
13x6
+21x'
+
34x8&c
Series
nata
ex
divisione
Unitatis
per
Trinomium
1 -x-xx.”
-A. de Moivre [64]
“The quantities
r, s, t, which
show the relation
of the terms, are
the same as those in
the denominator of
the fraction. This
property, howsoever
obvious it may
be, M. DeMoivre
was the first that
applied it to use,
in the solution of
problems about
infinite series, which
otherwise would
have been very
intricate.”
-J.
Stirling
[281]
refers to Fz. Otherwise the two least significant digits will be 01, and we
change them to 10 (thereby adding
F3
-
Fl
= 1). Finally, we must “carry”
as much as necessary by changing the digit pattern ‘011' to ‘100' until there
are no two l's in a row. (This carry rule is equivalent to replacing
Fm+l
+ F,
by
F,+z.)
For example, to go from 5 = (1000)~ to 6 =
(1001)~
or from
6 = (1001 )r to 7 =
(1010)~
requires no carrying; but to go from 7 =
(1010)~
to 8 =
(1OOOO)r
we must carry twice.
So far we’ve been discussing lots of properties of the Fibonacci numbers,
but we haven’t come up with a closed formula for them. We haven’t found
closed forms for Stirling numbers, Eulerian numbers, or Bernoulli numbers
either; but we were able to discover the closed form
H,
=
[“:‘]/n!
for har-
monic numbers. Is there a relation between
F,
and other quantities we know?
Can we “solve” the recurrence that defines F,?
The answer is yes. In fact, there’s a simple way to solve the recurrence by
using the idea of generating
finction
that we looked at briefly in Chapter 5.
Let’s consider the infinite series
F(z)
=
F.
+
F1:z+
Fzz2
+...
= tF,,z". (6.116)
TX20
If we can find a simple formula for F(z), chances are reasonably good that we
can find a simple formula for its coefficients
F,.
In Chapter 7 we will focus on generating functions in detail, but it will
be helpful to have this example under our belts by the time we get there.
The power series
F(z)
has a nice property if we look at what happens when
we multiply it by z and by z2:
F(z) =
F.
+
Flz
+
F2z2
+
F3z3
+
Fqz4
+
F5z5
+
...
,
zF(z)
=
Fez
+
F,z2
+
F2z3
+
F3z4
+
F4z5
+
...
,
z'F(z)
=
Foz2
+
F,z3
+
F2z4
+
F3z5
+
...
.
If we now subtract the last two equations from the first, the terms that involve
z2,
23,
and higher powers of z will all disappear, because of the Fibonacci
recurrence. Furthermore the constant term
FO
never actually appeared in the
first place, because
FO
= 0. Therefore all that’s left after the subtraction is
(F,
-
Fg)z,
which is just z. In other words,
F(z)-zF(z)-z.zF(z)
=
z,
and solving for F(z) gives us the compact formula
F(z) =
L-.
l-Z-22
(6.117)
284 SPECIAL NUMBERS
We have now boiled down all the information in the Fibonacci sequence
to a simple (although unrecognizable) expression z/( 1
-
z
-
2’).
This, believe
it or not, is progress, because we can factor the denominator and then use
partial fractions to achieve a formula that we can easily expand in power series.
The coefficients in this power series will be a closed form for the Fibonacci
numbers.
The plan of attack just sketched can perhaps be understood better if
we approach it backwards. If we have a simpler generating function, say
l/(
1
-
az)
where
K
is a constant, we know the coefficients of all powers of z,
because
1
-
=
1
-az
1+az+a2z2+a3z3+~~~.
Similarly, if we have a generating function of the form A/( 1
-
az)
+
B/(
1
-
pz),
the coefficients are easily determined, because
A
B
-
-
1
-
a2
+1+3z
=
A~(az)"+B~(@)"
1120 ll?O
=
xc
Aa” + BBn)z” .
(6.118)
n>o
Therefore all we have to do is find constants A, B, a, and
6
such that
A
B z
1
-
a2
t-m=
~~~
and we will have found a closed form Aa” + BP” for the coefficient F, of
z”
in F(z). The left-hand side can be rewritten
A
B
A-A@+B-Baz
-
-
1
-az
+1-f3z
=
Il-az)(l-pz)
so the four constants we seek are the solutions to two polynomial equations:
(1
-az)(l
-f32) = 1
-z-z2;
(6.119)
(A-t-B)-(A@+Ba)z
= z.
(6.120)
We want to factor the denominator of
F(z)
into the form (1
-
az)(l
-
(3~);
then we will be able to express
F(z)
as the sum of two fractions in which the
factors (1
-
az) and (1
-
Bz)
are conveniently separated from each other.
Notice that the denominator factors in
(6.119)
have been written in the
form
(1
-
az)
(1
-
(3z),
instead of the more usual form c(z
-
~1)
(z
-
~2)
where
p1
and
pz
are the roots. The reason is that (1
-
az)( 1
-
/3z)
leads to nicer
expansions in power series.
6.6 FIBONACCI NUMBERS 285
As usual, the au-
thors can't resist
a trick.
The
ratio
of one’s
height to the height
of one’s nave/ is
approximate/y
1.618, accord-
ing
to extensive
empirical observa-
tions by European
scholars
[ll
O].
We can find
01
,and
B
in several ways, one of which uses a slick trick: Let
us introduce a new variable w and try to find the factorization
w=-wz-z2
:=
(w
-
cxz)(w
-
bz) .
Then we can simply set w = 1 and we’ll have the factors of 1
-
z
-
z2.
The
roots of w2
-
wz
-
z2
= 0 can be found by the quadratic formula; they are
z*dJz2+4zz
1+Js
2
=
2=.
Therefore
w=
-wz-z=
-=
(
l+dS
1-d
w--z
2
I(
w--z
2
)
and we have the constants
cx
and
B
we were looking for.
The number (1 +
fi)/2
= 1.61803 is important in many parts of mathe-
matics as well as in the art world, where it has been considered since ancient
times to be the most pleasing ratio for many kinds of design. Therefore it
has a special name, the golden ratio. We denote it by the Greek letter
c$,
in
honor of Phidias who is said to have used it consciously in his sculpture. The
other root (1
-
fi)/2
=
-l/@
z
-
.61803 shares many properties of
4,
so it
has the special name
$,
“phi hat!’ These numbers are roots of the equation
w2-w-l
=O,sowehave
c$2
=
@+l;
$2
=
$+l.
(More about
cj~
and $ later.)
(6.121)
We have found the constants
LX
=
@
and
B
=
$i
needed in (6.119); now
we merely need to find A and B in (6.120). Setting
z
= 0 in that equation
tells us that B = -A, so (6.120) boils down to
-$A+@A = 1.
The solution is A = 1
/(c$
-
$)
= 1
/fi;
the partial fraction expansion of
(6.117) is therefore
Good, we’ve got F(z) right where we want it. Expanding the fractions into
power series as in (6.118) gives a closed form for the coefficient of
zn:
1
Fn
=
$V
4”).
('5.123)
(This formula was first published by Leonhard Euler
[91]
in 1765, but people
forgot about it until it was rediscovered by Jacques Binet
[25]
in 1843.)
286 SPECIAL NUMBERS
Before we stop to marvel at our derivation, we should check its accuracy.
For n = 0 the formula correctly gives
Fo
= 0; for n = 1, it gives
F1
=
(+
-
9)/v%, which is indeed 1. For higher powers, equations (6.121) show
that the numbers defined by (6.123) satisfy the Fibonacci recurrence, so they
must be the Fibonacci numbers by induction. (We could also expand 4”
and $” by the binomial theorem and chase down the various powers of
6;
but that gets pretty messy. The point of a closed form is not necessarily to
provide us with a fast method of calculation, but rather to tell us how
F,
relates to other quantities in mathematics.)
With a little clairvoyance we could simply have guessed formula (6.123)
and proved it by induction. But the method of generating functions is a pow-
erful way to discover it; in Chapter 7 we’ll see that the same method leads us
to the solution of recurrences that are considerably more difficult. Inciden-
tally, we never worried about whether the infinite sums in our derivation of
(6.123) were convergent; it turns out that most operations on the coefficients
of power series can be justified rigorously whether or not the sums actually
converge
[151].
Still, skeptical readers who suspect fallacious reasoning with
infinite sums can take comfort in the fact that equation
(6.123),
once found
by using infinite series, can be verified by a solid induction proof.
One of the interesting consequences of (6.123) is that the integer F, is
extremely close to the irrational number I$~/& when n is large. (Since $ is
less than 1 in absolute value, $” becomes exponentially small and its effect
is almost negligible.) For example,
Flo
= 55 and
F11
= 89 are very near
0
10
-
M
55.00364
and
c
43
6
zz
88.99775.
We can use this observation to derive another closed form,
rounded to the nearest integer,
(6.124)
because ( Gn/& 1<
i
for all. n 3 0. When n is even, F, is a little bit less
than
+“/&;
otherwise it is
,a
little greater.
Cassini’s identity (6.103) can be rewritten
F
n+l
Fll
(-1
)T'
---=--
Fn
Fn-I Fn-I
Fn
When n is large, 1 /F,-1 F, is very small, so
F,,+l
/F, must be very nearly the
same as
F,/F,-I;
and (6.124) tells us that this ratio approaches 4. In fact,
we have
F
n+l
=
$F,
+
$”
.
(6.125)
If the USA ever
goes metric, our
speed limit signs
will go from 55
mi/hr to 89 km/hr.
Or maybe the high.
way people
will
be
generous and let us
go 90.
The “shift down”
rule changes n
to
f(n/@)
and
the “shift
up”
rule changes n
to f
(n+)
, where
f(x) =
Lx
+ @‘J
6.6 FIBONACCI NUMBERS 287
(This identity is true by inspection when n = 0 or n =
1,
and by induction
when n > 1; we can also prove it directly by plugging in (6.123).) The ratio
F,+,/F,
is very close to
4,
which it alternately overshoots and undershoots.
By coincidence,
@
is also very nearly the number of kilometers in a mile.
(The exact number is 1.609344, since 1 inch is exactly 2.54 centimeters.)
This gives us a handy way to convert mentally between kilometers and miles,
because a distance of
F,+l
kilometers is (very nearly) a distance of
F,
miles.
Suppose we want to convert a non-Fibonacci number from kilometers
to miles; what is 30 km, American style? Easy: We just use the Fibonacci
number system and mentally convert 30 to its Fibonacci representation 21 +
8 + 1 by the greedy approach explained earlier. Now we can shift each number
down one notch, getting 13 + 5 + 1. (The former '1' was
Fz,
since k,
>
0 in
(6.113); the new ‘1’ is Fl.) Shifting down divides by
4,
more or less. Hence
19 miles is our estimate. (That’s pretty close; the correct answer is about
18.64 miles.) Similarly, to go from miles to kilometers we can shift up a
notch; 30 miles is approximately 34 + 13 + 2 = 49 kilometers. (That’s not
quite as close; the correct number is about 48.28.)
It turns out that this “shift down” rule gives the correctly rounded num-
ber of miles per n kilometers for all n < 100, except in the cases n = 4, 12,
62, 75, 91, and 96, when it is off by less than 2/3 mile. And the “shift up”
rule gives either the correctly rounded number of kilometers for n miles, or
1 km too
mariy,
for all n < 126. (The only really embarrassing case is n = 4,
where the individual rounding errors for n = 3 + 1 both go the same direction
instead of cancelling each other out.)
6.7 CONTINUANTS
Fibonacci numbers have important connections to the Stern-Brocot
tree that we studied in Chapter 4, and they have important generalizations to
a sequence of polynomials that Euler studied extensively. These polynomials
are called continuants, because they are the key to the study of continued
fractions like
1
00
+
1
(6.126)
al
+
-
1
a2 +
1
a3 +
1
a4
+
1
a5 +
___
1
a6 +
-
a7
288 SPECIAL NUMBERS
The continuant polynomial
K,(x1
,x2,.
. . , x,) has n parameters, and it
is defined by the following recurrence:
KoO
=
1
;
K,
(xl) =
XI
;
&(x1,.
.
.
,x,)
= Kn-1
(xl,.
. .
,x,-l
)x,
+
Kn-2(x1,. . . ,
~-2).
(6.127)
For example, the next three cases after
K1
(x1)
are
Kz(x1 ,x2)
=
x1x2
+
1
;
K3(xl,x2,x3)
=
x1x2x3+x1 +x3;
K4(xl,x2,x3,x4)
=
xlx2x3x4+x1x2+xlx4+x3x4+~
It’s easy to see, inductively, that the number of terms is a Fibonacci number:
K,(l,l,...
,I) =
Fn+l
.
(6.128)
When the number of parameters is implied by the context, we can write
simply ‘K’ instead of ‘K,‘,
,just
as we can omit the number of parameters
when we use the hypergeometric functions F of Chapter 5. For example,
K(x1,
x2)
=
Kz(xl
,
x2)
=
x1
x2
+ 1. The subscript n is of course necessary in
formulas like
(6.128).
Euler observed that
K(x1,
x2,
. . .
,x,,)
can be obtained by starting with
the product
x1
x2
. . .
x,,
and then striking out adjacent pairs
xkXk+l
in all
possible ways. We can represent Euler’s rule graphically by constructing all
“Morse code” sequences of dots and dashes having length n, where each dot
contributes 1 to the length and each dash contributes 2; here are the Morse
code sequences of length 4:
.
.
.
.
..-
.-.
-..
--
These dot-dash patterns correspond to the terms of
K(xl
,x2,x3, x4); a dot
signifies a variable that’s included and a dash signifies a pair of variables
that’s excluded. For example, l
-
l corresponds to
x1x4.
A Morse code sequence of length n that has k dashes has n-2k dots and
n
-
k symbols altogether. These dots and dashes can be arranged in (“i”)
ways; therefore if we replace each dot by z and each dash by 1 we get
K,,(z,
z,. .
PZk
(6.129)
6.7 CONTINUANTS 289
We also know that the total number of terms in a continuant is a Fibonacci
number; hence we have the identity
F,,+I
=
2
(“;
“)
k=O
(6.130)
(A closed form for (6.12g), generalizing the Euler-Binet formula (6.123) for
Fibonacci numbers, appears in (5.74).)
The relation between continuant polynomials and Morse code sequences
shows that continuants have a mirror symmetry:
K(x,,
.
.
.
,
x2,x1) =
K(x1,xr,...,xn).
(6.131)
Therefore they obey a recurrence that adjusts parameters at the left, in ad-
dition to the right-adjusting recurrence in definition (6.127):
K,(xI,...
,%I) =
XI&
1(X2,...,&1)
+Kn
2(x3,...,&).
(6.132)
Both of these recurrences are special cases of a more general law:
K
m+*(X1,...,X,,X,+1,~..,x~+~)
=
K,(xl,...,x,)K,(x,+~,...,x,+,)
+kn
I(xI,...,x,
l)K,
1(~,+2,...,~rn+n).
(6.133)
This law is easily understood from the Morse code analogy: The first product
K,K,
yields the terms of
K,+,
in which there is no dash in the [m, m +
11
position, while the second product yields the terms in which there is a dash
there. If we set all the x’s equal to 1, this identity tells us that Fm+n+l =
Fm+lF,+l +
F,F,;
thus, (6.108) is a special case of (6.133).
Euler
[90]
discovered that continuants obey an even more remarkable law,
which generalizes Cassini’s identity:
K
m+n(Xlr.~~
t
Xm+n)
Kk(Xm+l,
. . . ,
%n+k)
=
kn+k(Xl,
. . .
rX,+k)K,(x,+l,...,x,+,)
+
(-l)kKm
I(XI,...,X,
l)Kn
k
1(%n+k+2,...,Xm+,).
(6.134)
This law (proved in exercise 29) holds whenever the subscripts on the K’s are
all nonnegative. For example, when k = 2, m =
1,
and n = 3, we have
K(xl,x2,x3,x4)K(x2,x3)
= K(Xl,X2,X?,)K(XL,X3,X4)
+1
Continuant polynomials are intimately connected with Euclid’s algo-
rithm. Suppose, for example, that the computation of gcd(m, n) finishes
290 SPECIAL NUMBERS
in four steps:
@Cm,
n)
=
gcd(no,
nl 1
= gcd(nl ,
n2
1
= gcd(nr,n3’l
= gcd(n3,
na‘i
= gcd(ns,O) =
n4
no
= m, nl
=n;
n2
=
nomodn,
=
no-qlnl;
n3
= nl mod
n2
= nl
-
q2n2
;
n4
=
nzmodn3
=
nz-q3n3;
0 =
n3
modn4 =
n3
-
q4n4.
Then we have
n4
==
n4
= K()n4
;
n3
=I
q4n4
=
K(q4h;
w
=I
qm
+n4 =
K(q3,q4h;
nl
=T
q2n2
+n3 = K(qZlq3,q4)n4;
no
=T
qlnl
+w
=
K(ql,q2,q3,q4h
In general, if Euclid’s algorithm finds the greatest common divisor d in k steps,
after computing the sequence of quotients ql, . . . , qk, then the starting num-
bers were K(ql,qz,.. . ,qk)d and K(q2,. . . ,
qk)d. (This fact was noticed early
in the eighteenth century by Thomas Fantet de Lagny
[190],
who seems to
have been the first person to consider continuants explicitly. Lagny pointed
out that consecutive Fibonacci numbers, which occur as continuants when the
q’s take their minimum values, are therefore the smallest inputs that cause
Euclid’s algorithm to take a given number of steps.)
Continuants are also intimately connected with continued fractions, from
which they get their name. We have, for example,
1
a0 +
=
K(ao,al,az,a3)
1
-K(al,az,a3)
'
(6.135)
a1
+
~
1
a2 +
G
The same pattern holds for continued fractions of any depth. It is easily
proved by induction; we have, for example,
K(ao,al,az,a3+l/a4)
:=
K(ao,al,a2,a3,a4)
K(al, az,
a3
+ l/a41
K(al,az,as,ad)
because of the identity
K,(xl,.
. .
,xn-lrxn+Y)
=
K,(x,,...
,xn~l,x,)+Kn-l(xl,...,xn~l)~
(This identity is proved and generalized in exercise 30.)
(6.136)
6.7 CONTINUANTS 291
Moreover, continuants are closely connected with the Stern-Brocot tree
discussed in Chapter 4. Each node in that tree can be represented as a
sequence of L’s and
R'S,
say
RQO
La’
R”Z
L”’
. . .
Ran-’
LO-“-’
,
(6.137)
where
a0
3
0, al
3
1,
a2
3
1, a3
3
1,
. . . ,
a,-2
3
1,
an 1 3
0,
and n
is
even. Using the 2 x 2 matrices L and
R
of (4.33), it is not hard to prove by
induction that the matrix equivalent of (6.137) is
K,-2(al,.
.
.
)
an-21
Kn-l(al,...,an-2,an
I)
K,-l(ao,al,...,an-2)
Kn(ao,al,...,an~~2,an~l)
(The proof is part of exercise 80.) For example,
R”LbRcLd
=
bc + 1 bcd+b+d
abc + a + c
abcd+ab+ad+cd+l
Finally, therefore, we can use (4.34) to write a closed form for the fraction in
the Stern-Brocot tree whose L-and-R representation is (6.137):
f(R""
.,
.L"-')
:=
Kn+l(ao,al,...~an~l,l)
K,(al,.
.
.
,
an-l,
1
I
(6.139)
(This is “Halphen’s theorem” [143].) For example, to find the fraction for
LRRL we have
a0
= 0,
a1
=
1,
a2
= 2,
a3
= 1, and n = 4; equation (6.13~)
gives
K(O,
1,&l,
1)
KC4
1,l)
U&2)
5
K(l,Ll,l)
=
K(1,2,1,1)
=-=-
K(3,2) 7
(We have used the rule
K,(xl,.
. .
,x,-l,
x,
+
1)
=
K,+, (XI,. . .
,x,-r
,x,,,
1)
to
absorb leading and trailing l’s in the parameter lists; this rule is obtained by
setting y = 1 in (6.136).)
A comparison of (6.135) and (6.13~) shows that the fraction correspond-
ing to a general node (6.137) in the Stern-Brocot tree has the continued
fraction representation
f(Rao..
.
Lo-+’
) =
a0
+
1
1
(6.140)
al +
1
a2
+
1
.
.
.
+
1
an
I+-
1
292 SPECIAL NUMBERS
Thus we can convert at sight between continued fractions and the correspond-
ing nodes in the Stern-Brocot tree. For example,
I
f(LRRL) = 0+
~~
1
*
l+-7
2 $-
-
1,;
We observed in Chapter 4 that irrational numbers define infinite paths
in the Stern-Brocot tree, and that they can be represented as an infinite
string of L’s and R’s. If the infinite string for a is
RaoLal
RaZL”3
. . . , there is
a corresponding infinite continued fraction
1
a = aof
1
(‘3.141)
a1
+
~
1
a2 +
-
1
a3
+
1
a4 +
1
a5 +
-
This infinite continued fraction can also be obtained directly: Let
CQ
= a and
for k 3 0 let
ak
=
Lakj
;
1
ak
=
ak+-.
Kkfl
(6.142)
The a’s are called the “partial quotients” of a. If a is rational, say m/n,
this process runs through the quotients found by Euclid’s algorithm and then
stops (with akfl =
o0).
Is Euler’s constant y rational or irrational? Nobody knows. We can get
Or if
they do,
partial information about this famous unsolved problem by looking for y in
theY’re
not
ta’king.
the Stern-Brocot tree; if it’s rational we will find it, and if it’s irrational we
will find all the closest rational approximations to it. The continued fraction
for y begins with the following partial quotients:
Therefore its Stern-Brocot representation begins LRLLRLLRLLLLRRRL . .
;
no
pattern is evident. Calculations by Richard Brent
[33]
have shown that, if y
is rational, its denominator must be more than 10,000 decimal digits long.
6.7 CONTINUANTS 293
Well, y must be
irrational, because
of a little-known
Einsteinian asser-
tion: “God does
not throw huge
denominators at
the universe.”
Therefore nobody believes that y is rational; but nobody so far has been able
to prove that it isn’t.
Let’s conclude this chapter by proving a remarkable identity that ties a lot
of these ideas together. We introduced the notion of spectrum in Chapter 3;
the spectrum of
OL
is the multiset of numbers
Ln&],
where
01
is a given constant.
The infinite series
can therefore be said to be the generating function for the spectrum of
@,
where
@
= (1 +
fi)/2
is the golden ratio. The identity we will prove, dis-
covered in 1976 by J.L. Davison
[61],
is an infinite continued fraction that
relates this generating function to the Fibonacci sequence:
(6.143)
Both sides of (6.143) are interesting; let’s look first at the numbers
Ln@J.
If the Fibonacci representation (6.113) of n is
Fk,
+ . . . + Fk,, we expect n+
to be approximately
Fk,
+I
+.
. . + Fk,+i , the number we get from shifting the
Fibonacci representation left (as when converting from miles to kilometers).
In fact, we know from (6.125) that
n+
=
Fk,+,
+
.
.
.
+
Fk,+l
-
($“I
+
.
+
q”r)
.
Now+=-l/@andki
>...>>k,>>O,sowehave
and qkl
+..
.+$jkl has the same sign as (-1) kr, by a similar argument. Hence
In+] = Fk,+i
+.‘.+Fk,+l
-
[k,(n) iseven]. (6.144)
Let us say that a number n is Fibonacci odd (or F-odd for short) if its least
significant Fibonacci bit is 1; this is the same as saying that k,(n) = 2.
Otherwise n is Fibonacci even (F-even). For example, the smallest F-odd
294 SPECIAL NUMBERS
numbers are
1,
4,
6, 9, 12,
14, 17,
and
19.
If k,(n)
is even,
then
n
-
1
is
F-even, by (6.114); similarly, if k,(n) is odd, then n
-
1 is F-odd. Therefore
k,(n) is even
M
n
-
1 is F-even.
Furthermore, if k,(n) is even, (6.144) implies that
kT(
[n+])
= 2; if k,(n) is
odd, (6.144) says that
kr(
[rt@]) = k,(n) + 1. Therefore k,.( [n+J) is always
even, and we have proved that
In@]
-
1 is always F-even.
Conversely, if m is any F-even number, we can reverse this computation and
find an n such that m + 1
==
Ln@J.
(First add 1 in F-notation as explained
earlier. If no carries occur, n is (m + 2) shifted right; otherwise n is (m + 1)
shifted right.) The right-hand sum of (6.143) can therefore be written
x
z
LQJ
=
z
t
zm [m is F-even] ,
TL>l
ll@O
(6.145)
How about the fraction on the left? Let’s rewrite (6.143) so that the
continued fraction looks like (6.141), with all numerators 1:
1
1-Z
-=-
zcFfi
+
1
,lMJ
.
z
z
z-h
+
'
1
lI>l
z-F2 +
'-
(6.146)
(This transformation is a bit tricky! The numerator and denominator of the
original fraction having
zFn
as numerator should be divided by
zFnmI
.)
If
we stop this new continued fraction at l/zPFn, its value will be a ratio of
continuants,
K,,.z(O,
2~~0,
zPFI,. . .
,zPFn)
K,(z/
, . . . ,
z-~,)
-=
K,+, (z-~o,z~~I,. . .
,zpFn)
K,+, (z-~o,
z-~I,.
. ,
z-~,)
as in (6.135). Let’s look at the denominator first, in hopes that it will be
tractable. Setting Qn = K,+l z
Fo,.
. ,zPFn), we find Q. = 1, Q, = 1 +
z-l,
Q
2
=
1
-tz--’ + -2 Q
=
1
‘-I
z,
3
$
z + z-2 +
zP3
+ zP4, and in general everything
fits beautifully and gives a geometric series
Q,, =
1
+
z-’
+ z-2 + . . . + z-(Fn+2-l 1 .
6.7 CONTINUANTS 295
The corresponding numerator is P, =
K,(zpF’,
. . . ,
zpFn);
this turns out to
be like
Q,,
but with fewer terms. For example, we have
compared with
Q5
= 1 +
z-'
+
..
+
z--12.
A closer look reveals the pattern
governing which terms are present: We have
p
5
= 1 +22+z3+z5+z7+z8+z’o+z”
ZZ
Z’2
12
z-12
z
zm
[m
is F-even]
;
m=O
and in general we can prove by induction that
F,+z-’
p
n
=
z’-Fn+~
t
zm
[m
is F-even]
m=O
Therefore
Pll
t’,“Ji-’
z”’ [m is F-even]
-=
QTI
xLL;p’ Zm
Taking the limit as n
-+
0;)
now gives
(6.146),
because of (6.145).
Exercises
Warmups
1
What are the
[i]
= 11 permutations of
{l
,2,3,4} that have exactly two
cycles? (The cyclic forms appear in (6.4); non-cyclic forms like 2314 are
desired instead.)
2
There are mn functions from a set of n elements into a set of m elements.
How many of them range over exactly k different function values?
3
Card stackers in the real world know that it’s wise to allow a bit of slack
so that the cards will not topple over when a breath of wind comes along.
Suppose the center of gravity of the top k cards is required to be at least
E
units from the edge of the k + 1st card. (Thus, for example, the first
card can overhang the second by at most
1
-c
units.) Can we still achieve
arbitrarily large overhang, if we have enough cards?
4
Express
l/l +
l/3
+...
+
1/(2n+l)
in terms of harmonic numbers.
5
Explain how to get the recurrence (6.75) from the definition of
L&,(x,
y)
in
(6.74),
and solve the recurrence.
296 SPECIAL NUMBERS
6 An explorer has left a pair of baby rabbits on an island. If baby rabbits
become adults after one month, and if each pair of adult rabbits produces
one pair of baby rabbits every month, how many pairs of rabbits are
present after n months’? (After two months there are two pairs, one of
which is newborn.) Find a connection between this problem and the “bee
tree” in the text.
7 Show that Cassini’s identity (6.103) is a special case of (6.108), and a
special case of (6.134).
8 Use the Fibonacci number system to convert 65 mi/hr into an approxi-
mate number of km/hr.
9 About how many square kilometers are in 8 square miles?
10
What is the continued fraction representation of
$?
Basics
11 What is
I:,(-l)“[t],
th
e
row sum of Stirling’s cycle-number triangle
with alternating signs, when n is a nonnegative integer?
12 Prove that Stirling numbers have an inversion law analogous to (5.48):
g(n) =
G
{t}(--1
lkf(k)
W
f(n) =
$
[L]
(-l)kg(k).
13 The differential operators D = & and 4 = zD are mentioned in Chapters
2 and 5. We have
a2
= z2D2+zD,
because
a2f(z)
=
&f’(z)
= z&zf’(z) =
z2f”(z)
+ zf’(z), which is
(z2D2+zD)f(z). Similarly it can be shown that
a3
= z3D3+3z2D2+zD.
Prove the general formulas
for all n 3 0. (These can be used to convert between differential expres-
sions of the forms
tk
cxkzkfik’(z)
and
xk
fikakf(z), as in (5.1og).)
14 Prove the power identity (6.37) for Eulerian numbers.
15 Prove the Eulerian identity (6.39) by taking the mth difference of (6.37).
6 EXERCISES 297
16 What is the general solution of the double recurrence
A
n,O = %
[n>ol
;
Ao,k
= 0,
ifk>O;
A
n.k
=
k&-l,k
+
A,-
l,k-1
,
integers k, n,
when k and n range over the set of all integers?
17 Solve the following recurrences, assuming that
I;/
is zero when n < 0 or
k < 0:
a
IL1
=
/n~l~+nl~~~l+[~~=k=Ol,
for n, k > 0.
b /;I =
(n--
k)lnkl/
+
lLz:l
+
[n=k=Ol,
for n, k 3 0.
c I;/ =
k~n~l~+k~~~~~+[n=k=O],
for n, k 3 0.
18 Prove that the Stirling polynomials satisfy
(x+l)~n(x+l)
=
(x-n)o,(x)+xo,-,(x)
19 Prove that the generalized Stirling numbers satisfy
~{x~k}[xe~+k](-l)k/(~‘+:)
=
0,
intewn>O.
$
[x~k]{x~~+k}i-lik/(~++:)
=
0,
integern>O.
20 Find a closed form for
xz=,
Hf’.
21 Show that if
H,
= an/bn, where a, and b, are integers, the denominator
b, is a multiple of
2L1snj.
Hint: Consider the number
2L1snl
-‘H,
-
i.
22 Prove that the infinite sum
converges for all complex numbers z, except when
z
is a negative integer;
and show that it equals H, when
z
is a nonnegative integer. (Therefore we
can use this formula to define harmonic numbers H, when
z
is complex.)
23 Equation (6.81) gives the coefficients of z/(e’
-
1), when expanded in
powers of z. What are the coefficients of z/(e’ + 1 )? Hint: Consider the
identity
(e’+
l)(e’-
1) =
ezZ-
1.
298 SPECIAL NUMBERS
24 Prove that the tangent number Tz,+l is a multiple of 2”. Hint: Prove
that all coefficients of Tz,,(x) and
Tzn+l
(x) are multiples of 2”.
25 Equation (6.57) proves that the worm will eventually reach the end of
the rubber band at some time N. Therefore there must come a first
time n when he’s closer to the end after n minutes than he was after
n
-
1 minutes. Show that n <
:N.
26 Use summation by parts to evaluate
S,
=
xr=,
Hk/k. Hint: Consider
also the related sum
Et=,
Hk-r/k.
2’7 Prove the gcd law
(6.111)
for Fibonacci numbers.
28 The Lucas number
L,
is defined to be
Fn+r
+ F,--r. Thus, according to
(6.log), we have
Fzn
=
F,L,.
Here is a table of the first few values:
nl
0 1 2 3 4 5 6 7 8 9 10 11 12 13
L,,I 2
1
3
4
7
11
18
29 47 76
123
199 322
521
a
Use the repertoire method to show that the solution Qn to the gen-
eral recurrence
Qo = a;
Ql
= B;
Qn
=
Qn-l+Qn-2,
n>l
can be expressed in terms of
F,
and L,.
b Find a closed form for
L,
in terms of 4 and
$.
29 Prove Euler’s identity for continuants, equation (6.134).
30 Generalize (6.136) to find an expression for the incremented continuant
K(x,, . . .
,~,,~l,~~+y,~~+l,...,
x,,),
when
16
m<n.
Homework exercises
31 Find a closed form for the coefficients [:I in the representation of rising
powers by falling powers:
X
y=xl
I
n
Xk
kk'
integer n > 0.
(For example,
x4=x%+
12x3+36x2+24x1,
hence
141
=
36.).
32 In Chapter 5 we obtained the formulas
&(“:“)
=
(n+mm+l)
and
o&m(:)
=
(:I:)
\.
by unfolding the recurrence
(c)
=
(“i’)
+
(:I:)
in two ways. What
identities appear when the analogous recurrence {L} = k{
“i’
}
+
{
:I,’
}
is unwound?
6 EXERCISES
299
33 Table 250 gives the values of
[;I
and
{
;} What are closed forms (not
involving Stirling numbers) for the next cases,
[;]
and
{‘;}?
34 What are
(:)
and
(-,‘),
if the basic recursion relation (6.35) is assumed
to hold for all integers k and n, and if (L) = 0 for all k < O?
35 Prove that, for every
E
> 0, there exists an integer n > 1 (depending
on
e)
such that
H,
mod 1 <
c.
36 Is it possible to stack n bricks in such a way that the topmost brick is not
above any point of the bottommost brick, yet a person who weighs the
same as 100 bricks can balance on the middle of the top brick without
toppling the pile?
37 Express
I.,“=“,
(k mod
m)/k(k
+ 1) in terms of harmonic numbers, as-
suming that m and n are positive integers. What is the limiting value
asn-+co?
Ah! Those were
prime years.
38 Find the indefinite sum
x
(I)
(-l)kHk
6k.
39 Express
xz=,
Ht in terms of n and H,.
40 Prove that 1979 divides the numerator of
t~~,9(-l)k~‘/k,
and give a
similar result for 1987. Hint: Use Gauss’s trick to obtain a sum of
fractions whose numerators are 1979. See also exercise 4.
41 Evaluate the sum
in closed form, when n is an integer (possibly negative).
42 If S is a set of integers, let S + 1 be the “shifted” set {x + 1 1 x
E
S}.
How many subsets of
{l
,2, . . , n} have the property that S U (S + 1) =
{1,2,...,n+l}?
43 Prove that the infinite sum
.l
+.Ol
+.002
+.0003
+.00005
+.000008
+.0000013
converges to a rational number.
300 SPECIAL NUMBERS
44 Prove the converse of Cassini’s identity (6.106): If k and m are integers
such that Im2-km-k21 = 1, then there is an integer n such that k =
fF,
and m =
fF,+l.
45 Use the repertoire method to solve the general recurrence
X0
= a;
x, = p;
Xn
= X,--l
+X,-2+yn+6.
46
What are cos 36” and cos 72”?
47 Show that
2"~'h =
;
(2;,)5k,
and use this identity to deduce the values of
F,
mod p and
F,+1
mod p
when p is prime.
48 Prove that zero-valued parameters can be removed from continuant poly-
nomials by collapsing their neighbors together:
K,(xl,...
,xTl-1,0,x
m+l,...,Xn)
=
K,-2(x,,.
. . ,
Xm~Z,Xm~l+X,+l,X,+Z,...,X,),
l<m<n.
49 Find the continued fraction representation of the number
&,
2-ln@J.
50 Define f(n) for all positive integers n by the recurrence
f(1) = 1;
f(2n) = f(n);
f(2nfl)
=
f(n)+f(n+l).
a
For which
n
is
f(n)
even?
b Show that f(n) can be expressed in terms of continuants.
Exam problems
51 Let p be a prime number.
a Prove that
{E}
E
[E]
z
0 (mod p), for 1 < k < p.
b Prove that
[“,‘I
E
1 (mod p), for 1
6
k < p.
C
Prove that {‘“;‘}
G
[‘“,-‘1
E
0 (mod p).
d Prove that if p > 3 we have [;] F 0 (mod
p2).
Hint: Consider
pp.
52
Let
H,
be written in lowest terms as an/bn.
a Prove that p\b,,
+=+
p%aln,pJ,
if p is prime.
b Find all n > 0 such that a,, is divisible by 5.
6 EXERCISES 301
53 Find a closed form for
tkm,O
(E)-‘(-l)kHk,
when 0 6 m < n. Hint:
Exercise 5.42 has the sum without the Hk factor.
54 Let n > 0. The purpose of this exercise is to show that the denominator
of
Bz,,
is the product of all primes p such that (p-1)\(2n).
a Show that S,(p) + [(p-l)\
m is a multiple of p, when p is prime
]
and m > 0.
b Use the result of part (a) to show that
Bzn
+
x
[(p-‘)\(2n)l =
Izn
is an integer.
p
prime
P
Hint: It suffices to prove that, if p is any prime, the denominator of
the fraction
Bz,,
+
[(p-1)\(2n)]/p
is not divisible by
p.
C
Prove that the denominator of
Bzn
is always an odd multiple of 6,
and it is equal to 6 for infinitely many n.
55
Prove (6.70) as a corollary of a more general identity, by summing
and differentiating with respect to x.
56 Evaluate
t
k+m
(;) t-1 lkkn+‘/(k-
m in closed form as a function of the
)
integers m and n. (The sum is over all integers k except for the value
k=m.)
57
The “wraparound binomial coefficients of order 5” are defined by
((;)>
=
((nk’))
+
((,k:;mod,))’
n>O’
and
((E))
= [k=Ol. Let
Q,,
be the difference between the largest and
smallest of these numbers in row n:
Qn
=
E5((L))
-
o%((;))
*
Find and prove a relation between
Q,,
and the Fibonacci numbers.
58
Find closed forms for
&c
Fiz” and
tntO
F:zn.
What do you deduce
about the quantity Fi,,
-
4Fi
-
F:_,?
59 Prove that if m and n are positive integers, there exists an integer x such
that
F,
E
m (mod
3”).
60 Find all positive integers n such that either F, + 1 or F,
-
1 is a prime
number.
302 SPECIAL NUMBERS
61 Prove the identity
integer n 3 1.
What is
~~=,
1 /FJ.2k?
62 Let A, =
4”
+ @-” and
B,
=
4”
-
a-“.
a
Find constants
OL
and
B
such that A,, =
aA,-1
+
@An-2
and
B,
=
OLB~-I
+ BBn-2 for all n 3 0.
b
Express A,, and
B,
in terms of
F,
and
L,
(see exercise 28).
C
Prove that
xE=,
1 ,/(Fzk+l + 1) =
B,/A,+l.
d
Find a closed form for
EL=,
l/(F~k+,
-
1).
Bonus
problems
Bogus problems
63 How many permutations
7~1~2..
. rrn of
{1,2,.
. . , n} have exactly k in-
dices j such that
a
rri <
7Cj
for all i < j? (Such j are called “left-to-right maxima!‘)
b
nj
> j? (Such j are called “excedances!‘)
64 What is the denominator of
[,j/f,],
when this fraction is reduced to
lowest terms?
65 Prove the identity
1
s
s
1
n f(k)
. . .
0 0
f(lx,
+...+x,])dx,
.
..dx.
=
x
k
nl.
k
0
66 Show that ((y)) = 2(y), and find a closed form for ((y)).
67 Find a closed form for
Et=,
k’H,,+k.
68 Show that the generalized harmonic numbers of exercise 22 have the
power series expansion
H, =
x(-l)nHL)zn-‘.
n>2
69 Prove that the generalized factorial of equation (5.83) can be written
by considering the limit as n
+
00
of the first n factors of this infinite
product. Show that -&(z!) is related to the general harmonic numbers of
exercise 22. .
304 SPECIAL NUMBERS
80 Show that continuant polynomials appear in the matrix product
(i
A)(;
J2)-.(Y
iI)
and in the determinant
det
I
-1
Xl
x2
1
0 1 0 0 . . . 0 0
1:
0
0
-1x31
0 -1
,..
-1 . . .
1
:
x,
81 Generalizing (6.146), find a continued fraction related to the generating
function En21
z LnaJ,
when
01
is any positive irrational number.
82 Let m and n be odd, positive integers. Find closed forms for
%I
=
&
F2,,*+:+F
;
m
"J
=
x
Fzmk+:-Fm'
k>O
Hint: The sums in exercise 62 are S:,3
-
ST,,,,,
and
S1,s
-
ST,~,+~.
83 Let
o(
be an irrational number in
(0,l)
and let al, a2,
as,
. . . be the
partial quotients in its continued fraction representation. Show that
ID
(01,
n) 1 < 2 when n = K(
al,
. . . , a,), where D is the discrepancy
defined in Chapter 3.
84 Let Q,, be the largest denominator on level n of the Stern-Brocot tree.
(Thus
(Qo,
QI, Q2, Q3,Qh,. . .) =
(1,2,3,5,8,.
. .) according to the dia-
gram in Chapter 4.) Prove that Q,, = F,+2.
85 Characterize all N such that the Fibonacci residues
{FomodN,
FI
modN,
FzmodN,
. ..}
form the complete set {0, 1,. . . , N
-
l}. (See exercise 59.)
Research problems
86 What is the best way to extend the definition of {t} to arbitrary real
values of n and k?
87 Let
H,
be written in lowest terms as
an/b,,
as in exercise 52.
a
Are there infinitely many n with 11 \a,?
b
Are there infinitely many n with b, =
lcm(l,2,.
. .
,n)?
(Two such
values are n = 250 and n = 1000.)
88 Prove that y and
eY
are irrational.
7
Generating Functions
THE MOST POWERFUL WAY to deal with sequences of numbers, as far
as anybody knows, is to manipulate infinite series that “generate” those se-
quences.
We’ve learned a lot of sequences and we’ve seen a few generating
functions; now we’re ready to explore generating functions in depth, and to
see how remarkably useful they are.
7.1
DOMINO THEORY AND CHANGE
Generating functions are important enough, and for many of us new
enough, to justify a relaxed approach as we begin to look at them more closely.
So let’s start this chapter with some fun and games as we try to develop our
intuitions about generating functions. We will study two applications of the
ideas, one involving dominoes and the other involving coins.
How many ways
T,,
are there to completely cover a 2 x n rectangle with
2 x 1 dominoes? We assume that the dominoes are identical (either because
they’re face down, or because someone has rendered them indistinguishable,
say by painting them all red); thus only their orientations-vertical or hori-
zontal-matter, and we can imagine that we’re working with domino-shaped
tiles. For example, there are three tilings of a 2 x 3 rectangle, namely llll,
B,
and Eli; so
T3
= 3.
To find a closed form for general T, we do our usual first thing, look at
“Let me count the
small cases. When n = 1 there’s obviously just one tiling,
0;
and when n = 2
ways.
there are two, •l and El.
-E. B. Browning
How about when n = 0; how many tilings of a 2 x 0 rectangle are there?
It’s not immediately clear what this question means, but we’ve seen similar
situations before: There is one permutation of zero objects (namely the empty
permutation), so O! =
1.
There is one way to choose zero things from n things
(namely to choose nothing), so (t) = 1. There is one way to partition the
empty set into zero nonempty subsets, but there are no such ways to partition
a nonempty set; so
{:}
= [n =
01.
By such reasoning we can conclude that
306
7.1 DOMINO THEORY AND CHANGE 307
there’s just one way to tile a 2 x 0 rectangle with dominoes, namely to use
no dominoes; therefore
To
= 1. (This spoils the simple pattern
T,,
= n that
holds when n = 1, 2, and 3; but that pattern was probably doomed anyway,
since
To
wants to be 1 according to the logic of the situation.) A proper
understanding of the null case turns out to be useful whenever we want to
solve an enumeration problem.
Let’s look at one more small case, n = 4. There are two possibilities for
tiling the left edge of the rectangle-we put either a vertical domino or two
horizontal dominoes there. If we choose a vertical one, the partial solution is
CO
and the remaining 2 x 3 rectangle can be covered in
T3
ways. If we choose
two horizontals, the partial solution
m
can be completed in
TJ
ways. Thus
T4
=
T3
+
T1
= 5. (The five tilings are
UIR,
UE, El, EII, and
M.)
We now know the first five values of
T,,:
These look suspiciously like the Fibonacci numbers, and it’s not hard to see
why: The reasoning we used to establish
T4
=
T3
+
T2
easily generalizes to
T,,
= T,_l + Tn-2, for n > 2. Thus we have the same recurrence here as for
the Fibonacci numbers, except that the initial values
TO
= 1 and
T,
= 1 are a
little different. But these initial values are the consecutive Fibonacci numbers
F1
and
F2,
so the T’s are just Fibonacci numbers shifted up one place:
Tn
=
F,+I
,
for n > 0.
(We consider this to be a closed form for
Tnr
because the Fibonacci numbers
are important enough to be considered “known!’ Also,
F,
itself has a closed
form (6.123) in terms of algebraic operations.) Notice that this equation
confirms the wisdom of setting
To
= 1.
But what does all this have to do with generating functions? Well, we’re
about to get to that -there’s another way to figure out what
T,,
is. This new
‘lb boldly go
way is based on a bold idea. Let’s consider the “sum” of all possible 2 x n
where no tiling has
gone before.
tilings, for all n 3 0, and call it T:
T
=~+o+rn+~+m~+m+a+....
(7.1)
(The first term ‘I’ on the right stands for the null tiling of a 2 x 0 rectangle.)
This sum T represents lots of information. It’s useful because it lets us prove
things about T as a whole rather than forcing us to prove them (by induction)
about its individual terms.
The terms of this sum stand for tilings, which are combinatorial objects.
We won’t be fussy about what’s considered legal when infinitely many tilings
308 GENERATING FUNCTIONS
are added together; everything can be made rigorous, but our goal right now
is to expand our consciousness beyond conventional algebraic formulas.
We’ve added the patterns together, and we can also multiply them-by
juxtaposition. For example, we can multiply the tilings
0
and E to get the
new tiling iEi. But notice that multiplication is not commutative; that is, the
order of multiplication counts: [B is different from EL
Using this notion of multiplication it’s not hard to see that the null
tiling plays a special role--it is the multiplicative identity. For instance,
IxEi=Exl=E.
Now we can use domino arithmetic to manipulate the infinite sum T:
T = I+O+CI+E+Ull+CEl+Ell+~~~
=
~+o(~+o+m+8-t~~~)+8(~+0+m+e+~~~)
=
I+UT+HT.
(7.2)
Every valid tiling occurs exactly once in each right side, so what we’ve done is
reasonable even though we’re ignoring the cautions in Chapter 2 about “ab-
solute convergence!’ The bottom line of this equation tells us that everything
I
have a gut fee/-
in T is either the null tiling, or is a vertical tile followed by something else
ing that these
in T, or is two horizontal tiles followed by something else in T.
sums must con-
verge, as long as
So now let’s try to solve the equation for T. Replacing the T on the left
the dominoes are
by IT and subtracting the last two terms on the right from both sides of the sma”en’Ju&
equation, we get
(I-O-E)T = I.
(7.3)
For a consistency check, here’s an expanded version:
I
+
0
+
q
+
E
+
ml
+
m
+
En
+...
-n-m-~-~-rJ-J-J-rjyg-rj=J
-...
-~-.a--EgJ-@=J-~-KJ-~
-...
Every term in the top row, except the first, is cancelled by a term in either
the second or third row, so our equation is correct.
So far it’s been fairly easy to make combinatorial sense of the equations
we’ve been working with. Now, however, to get a compact expression for T
we cross a combinatorial divide. With a leap of algebraic faith we divide both
sides of equation (7.3) by
I--O-E
to get
T=
I
I-o-8’
(7.4)
7.1 DOMINO THEORY AND CHANGE 309
(Multiplication isn’t commutative, so we’re on the verge of cheating, by not
distinguishing between left and right division. In our application it doesn’t
matter, because I commutes with everything. But let’s not be picky, unless
our wild ideas lead to paradoxes.)
The next step is to expand this fraction as a power series, using the rule
1
-=
1-z
1 +
2
+
z2
+
z3
+ . . . .
The null tiling I, which is the multiplicative identity for our combinatorial
arithmetic, plays the part of
1,
the usual multiplicative identity; and
0
+ q
plays
z.
So we get the expansion
I
I-U-El
=
I+I:o+E)+(u+E)2+(u+E)3+~~~
=
~+~:o+e)+(m+m+~+m)
+ (ml+uB+al+rm+Bn+BE+E3l+m3)
f...
.
This is T, but the tilings are arranged in a different order than we had before.
Every tiling appears exactly once in this sum; for example,
CEXE!ll
appears
in the expansion of
(
0
+
E
)‘.
We can get useful information from this infinite sum by compressing it
down, ignoring details that are not of interest. For example, we can imagine
that the patterns become unglued and that the individual dominoes commute
with each other; then a term like IEEIB becomes
C1406,
because it contains
four verticals and six horizontals. Collecting like terms gives us the series
T
=I+O+02-to2+03+2002t04+30202+~4+~~~.
The
20
=2
here represents the two terms of the old expansion,
B
and
ELI,
that
have one vertical and two horizontal dominoes; similarly
302
0’
represents the
three terms CB, CH, and Elll. We’re essentially treating
II
and o as ordinary
(commutative) variables.
We can find a closed form for the coefficients in the commutative version
of T by using the binomial theorem:
I
I-
(0
+ 02)
= I+(o+o~)+(o+,~)~+(o+~~)~+...
=
~(Ofo2)k
k>O
(7d
310 GENERATING FUNCTIONS
(The last step replaces k-j by m; this is legal because we have (1) = 0 when
0 6 k < j.) We conclude that
(‘;“)
is the number of ways to tile a 2 x (j
+2m)
rectangle with j vertical dominoes and 2m horizontal dominoes. For example,
we recently looked at the 2 x 10 tiling CERIRJ, which involves four verticals
and six horizontals; there are (“1”) = 35 such tilings in all, so one of the terms
in the commutative version of T is
350406.
We can suppress even more detail by ignoring the orientation of the
dominoes. Suppose we don’t care about the horizontal/vertical breakdown;
we only want to know about the total number of 2 x n tilings. (This, in
fact, is the number T, we started out trying to discover.) We can collect
the necessary information by simply substituting
a.
single quantity,
z,
for
0
and O. And we might as well also replace I by 1, getting
Now
I’m
dis-
oriented.
T=
1
l-z-22'
(7.6)
This is the generating function (6.117) for Fibonacci numbers, except for a
missing factor of
z
in the numerator; so we conclude that the coefficient of
Z”
in T is F,+r .
The compact representations
I/(1-O-R),
I/(I-O-EI~),
and 1/(1-z-z')
that we have deduced for
T
are called generating functions, because they
generate the coefficients of interest.
Incidentally, our derivation implies that the number of 2 x n domino
tilings with exactly m pairs of horizontal dominoes is (“-,“). (This follows
because there are j = n
-
2m vertical dominoes, hence there are
(i:m)
=
(j+J
=
(“m”)
ways to do the tiling according to our formula.) We observed in Chapter 6
that (“km) is the number of Morse code sequences of length n that contain
m dashes; in fact, it’s easy to see that 2 x n domino tilings correspond directly
to Morse code sequences. l(The tiling
CEEURI
corresponds to
‘a-
-*a -*‘.)
Thus domino tilings are closely related to the continuant polynomials we
studied in Chapter 6. It’s a small world.
We have solved the
T,
problem in two ways. The first way, guessing the
answer and proving it by induction, was easier; the second way, using infinite
sums of domino patterns and distilling out the coefficients of interest, was
fancier. But did we use the second method only because it was amusing to
play with dominoes as if they were algebraic variables? No; the real reason
for introducing the second way was that the infinite-sum approach is a lot
more powerful. The second method applies to many more problems, because,
it doesn’t require us to make magic guesses.
7.1 DOMINO THEORY AND CHANGE 311
Let’s generalize up a notch, to a problem where guesswork will be beyond
us. How many ways Ll, are there to tile a 3 x n rectangle with dominoes?
The first few cases of this problem tell us a little: The null tiling gives
UO
=
1.
There is no valid tiling when n = 1, since a 2 x 1 domino doesn’t fill
a 3 x 1 rectangle, and since there isn’t room for two. The next case, n = 2,
can easily be done by hand; there are three tilings,
1,
m, and R, so
UZ
= 3.
(Come to think of it we already knew this, because the previous problem told
us that
T3
= 3; the number of ways to tile a 3 x 2 rectangle is the same as the
number to tile a 2 x 3.) When n = 3, as when n =
1,
there are no tilings. We
can convince ourselves of this either by making a quick exhaustive search or
by looking at the problem from a higher level: The area of a 3 x 3 rectangle is
odd, so we can’t possibly tile it with dominoes whose area is even. (The same
argument obviously applies to any odd n.) Finally, when n = 4 there seem
to be about a dozen tilings; it’s difficult to be sure about the exact number
without spending a lot of time to guarantee that the list is complete.
So let’s try the infinite-sum approach that worked last time:
u
=I+E9+f13+~+W+~-tW+e4+~+....
(7.7)
Every non-null tiling begins with either 0 or
B
or
8;
but unfortunately the
first two of these three possibilities don’t simply factor out and leave us with
U
again. The sum of all terms in
U
that begin with 0 can, however, be written
as
LV,
where
v
=~+g+~+g+Q+...
is the sum of all domino tilings of a mutilated 3 x n rectangle that has its
lower left corner missing. Similarly, the terms of
U
that begin with
Ei’
can be
written FA, where
consists of all rectangular tilings lacking their upper left corner. The series A
is a mirror image of V. These factorizations allow us to write
u
= I
+0V+-BA+pJl.
And we can factor V and A as well, because such tilings can begin in only
two ways:
v =
ml+%V,
A =
gU+@A.
312 GENERATING FUNCTIONS
Now we have three equations in three unknowns (U, V, and A). We can solve
them by first solving for V and A in terms of U, then plugging the results
into the equation for U:
v
=
(I
-
Q)-ml,
A
=
(I-g)-‘ou;
u
=
I
+
B(l-B,)-‘ml
+
B(I-
gyou
+
pJu
And the final equation can be solved for U, giving the compact formula
I
u
=
1
-
B(l-@)-‘[I
-
B(I-gJ-‘o
-
R’
(7.8)
This expression defines the infinite sum U, just as (7.4) defines T.
The next step is to go commutative.
Everything simplifies beautifully
when we detach all the dominoes and use only powers of
II
and =:
u=
1
1
-
O&(1
-
,3)-~’
-
Po(l
-
,3)-l
-
,3
l-o3
=
(I-
,3)2
-20%;
(1
-
c33)-’
=
l-202
-
o(1
-
&:I+
1
2020
404 02
80603
=m+
~-
(1
-
,3)3
+
(1
-
,3)5
+
(1
-
,3)7
+...
=
t
(m;2k)2’.,,2kak+h.
k,m>O
(This derivation deserves careful scrutiny. The last step uses the formula
(1
-
,)-2k--1 =
Em
(m+mZk)Wm,
identity (5.56).) Let’s take a good look at
the bottom line to see what it tells us. First, it says that every 3 x n tiling
uses an even number of vertical dominoes. Moreover, if there are 2k verticals,
there must be at least k horizontals, and the total number of horizontals must
be k + 3m for some m 3 0. Finally, the number of possible tilings with 2k
verticals and k + 3m horizontals is exactly (“i2k)2k.
We now are able to analyze the 3 x 4 tilings that left us doubtful when we
began looking at the 3 x n problem. When n = 4 the total area is 12, so we
need six dominoes altogether. There are 2k verticals and k + 3m horizontals,
I /earned in another
class about “regular
expressions.” If
I’m
not mistaken, we
can write
u =
(LB,*0
+BR*o+H)*
in the language of
regular expressions;
so there must be
some connection
between regular
expressions and gen-
erating functions.
7.1 DOMINO THEORY AND CHANGE 313
for some k and
m;
hence 2k + k + 3m = 6. In other words, k + m = 2.
If we use no
vertic:als,
then k = 0 and m = 2; the number of possibilities
is (Zt0)20 = 1. (This accounts for the tiling
B.)
If we use two verticals,
then k = 1 and m = 1; there are (‘t2)2’ = 6 such tilings. And if we use
four verticals, then k = 2 and m = 0; there are (“i4)22 = 4 such tilings,
making a total of
114
= 11. In general if n is even, this reasoning shows that
k + m =
in,
hence
(mL2k)
=
($5’:)
and the total number of 3 x n tilings is
(7.9)
As before, we can also substitute z for both
0
and O, getting a gen-
erating function that doesn’t discriminate between dominoes of particular
persuasions. The result is
u=-
1
1
-z3
1
-z3(1
-9-l
-z3(1
-9-1
-z3
=
l-423
$26.
(7.10)
If we expand this quotient into a power series, we get
U = 1
+U2z”+U4Z6+U~Z9+UsZ12+~~~,
a generating function for the numbers U,. (There’s a curious mismatch be-
tween subscripts and exponents in this formula, but it is easily explained. The
coefficient of z9, for example, is
Ug,
which counts the tilings of a 3 x 6 rectan-
gle. This is what we want, because every such tiling contains nine dominoes.)
We could proceed to analyze (7.10) and get a closed form for the coeffi-
cients, but it’s
bett,er
to save that for later in the chapter after we’ve gotten
more experience. So let’s divest ourselves of dominoes for the moment and
proceed to the next advertised problem, “change!’
How many ways are there to pay 50 cents? We assume that the payment
must be made with pennies
0,
nickels
0,
dimes
@,
quarters
0,
and half-
Ah yes, I remember
dollars
@.
George Polya
[239]
popularized this problem by showing that it
when we had
half-
dollars.
can be solved with generating functions in an instructive way.
Let’s set up infinite sums that represent all possible ways to give change,
just as we tackled the domino problems by working with infinite sums that
represent all possible domino patterns. It’s simplest to start by working with
fewer varieties of coins, so let’s suppose first that we have nothing but pennies.
The sum of all ways to leave some number of pennies (but just pennies) in
change can be written
P
=
%+o+oo+ooo+oooo+
=
J+O+02+03+04+...
.
314 GENERATING FUNCTIONS
The first term stands for the way to leave no pennies, the second term stands
for one penny, then two pennies, three pennies, and so on. Now if we’re
allowed to use both pennies and nickels, the sum of all possible ways is
since each payment has a certain number of nickels chosen from the first
factor and a certain number of pennies chosen from P. (Notice that N is
not the sum { +
0
+
0
$-
(0
+ O)2 +
(0
+
@)3
+ . . . , because such a
sum includes many types of payment more than once. For example, the term
(0
+ @)2 =
00
+
00
+
00
+
00
treats
00
and
00
as if they were
different, but we want to list each set of coins only once without respect to
order.)
Similarly, if dimes are permitted as well, we get the infinite sum
D =
(++@+@2+@3+@4+..)N,
which includes terms like
@3@3@5
= @@@@@@@@O@@ when it is
expanded in full. Each of these terms is a different way to make change.
Adding quarters and then half-dollars to the realm of possibilities gives
Coins of the realm.
Q =
(++@+@2+@3+@4+...)D;
C =
(++@+@2+@3+@4+-.)Q.
Our problem is to find the number of terms in C worth exactly
509!.
A simple trick solves this problem nicely: We can replace
0
by z,
@
by z5,
@
by
z”,
@ by
z25,
and
@
by
z50.
Then each term is replaced by
zn,
where n is the monetary value of the original term. For example, the term
@@@@@ becomes
z50+10f5+5+’
=
2”.
The four ways of paying 13 cents,
namely @,03, @OS,
0203,
and 013, each reduce to
z13;
hence the coefficient
of
z13
will be 4 after the z-substitutions are made.
Let
P,,
N,,
D,,
Qn,
and
C,
be the numbers of ways to pay n cents
when we’re allowed to use coins that are worth at most
1,
5, 10, 25, and 50
cents, respectively. Our analysis tells us that these are the coefficients of 2”
in the respective power series
P = 1 + z +
z2
+
z3
+
z4
+ . .
)
N = (1
+~~+z’~+z’~‘+z~~+...)P,
D =
(1+z’0+z20+z”0+z40+...)N,
Q = (1
+z25+z50+z;‘5+~‘oo+~~~)D,
C
=
(1
+,50+z’00+z’50+Z200+...)Q~
7.1 DOMINO THEORY AND CHANGE 315
How many pennies
Obviously P, = 1 for all n 3 0. And a little thought proves that we have
are
there, really?
If n is greater
N,
= Ln/5J + 1: To make n cents out of pennies and nickels, we must choose
than, say,
10”)
either 0 or
1
or . . . or
Ln/5] nickels, after which there’s only one way to supply
I
bet that
P,
= 0
the requisite number of pennies. Thus P, and
N,
are simple; but the values
in the “real world.”
of
Dn,
Qn,
and
C,
are increasingly more complicated.
One way to deal with these formulas is to realize that 1 +
zm
+ 2’“’
+.
. .
is just
l/(1
-
2”‘). Thus we can write
P =
l/(1
-2’1,
N = P/(1
-i’),
D =
N/(1
-
2”)
,
Q =
D/(1
-
zz5)
,
C = Q/(1
-2”).
Multiplying by the denominators, we have
(l-z)P
= 1,
(1
-z5)N
= P,
(l-z”)D
= N,
(~-z~~)Q
= D,
(1-z5’)C
= Q.
Now we can equate coefficients of 2” in these equations, getting recurrence
relations from which the desired coefficients can quickly be computed:
P,
=
P,-I
+ [n=O] ,
N,
=
N-5
+
P,,
D, =
Dn-IO
-tN,,
Qn =
Qn-25
-t
D,,
Cn
=
G-50
+ Qn.
For example, the coefficient of
Z”
in D = (1
-
z~~)Q is equal to Q,,
-
Qnp25;
so we must have Qll
-
Qnp25 = D,, as claimed.
We could unfold these recurrences and find, for example, that Qn =
D,+D,-zs+Dn~5o+Dn~75+...,
stopping when the subscripts get negative.
But the non-iterated form is convenient because each coefficient is computed
with just one addition, as in Pascal’s triangle.
Let’s use the recurrences to find
Csc.
First,
Cso
=
CO
+
Q50;
so we want
to know Qso. Then
Q50
=
Q25
+
D50,
and
Q25
=
QO
+ D25; so we also want
to know
D50
and 1125. These
D,
depend in turn on
DUO,
DUO,
DUO,
D15,
DIO,
D5,
and
on
NSO,
NC,,
.
.
.
,
Ns. A simple calculation therefore suffices to
316
GENERATING FUNCTIONS
determine all the necessary coefficients:
n 0 5
10 15 20 25
30 35 40 45 50
P,
11111111111
NTI
12345 6 7
8
9 10 11
D,
12 4 6 9 1216 25 36
Qn
1
13 49
G
1
50
The final value in the table gives us our answer,
COO:
There are exactly 50 ways
to leave a 50-cent tip.
(Not counting the
How about a closed form for C,? Multiplying the equations together
Option
ofchar@ng
gives us the compact expression
the tip to a
credit
card.)
11
1 1 1
c
=
----~~
1
--z
1
--5
1
-zz~o
1
-z25
1
-z50
1
(7.11)
but it’s not obvious how to get from here to the coefficient of
zn.
Fortunately
there is a way; we’ll return to this problem later in the chapter.
More elegant formulas arise if we consider the problem of giving change
when we live in a land that mints coins of every positive integer denomination
(0,
0,
0,
. . .
) instead of just the five we allowed before. The corresponding
generating function is an infinite product of fractions,
1
(1 -z)(l -22)(1 -23)..1'
and the coefficient of 2” when these factors are fully multiplied out is called
p(n), the number of partitions of n. A partition of n is a representation of n
as a sum of positive integers, disregarding order. For example, there are seven
different partitions of 5, namely
5=4+1=3+2=3+11-1=2+2+1=2+1+1+1=1+1+1+1+1;
hence p(5) = 7. (Also p(2)
=:
2, p(3) = 3, p(4) = 5, and p(6) = 11; it begins
to look as if p(n) is always a prime number. But
p(
7) = 15, spoiling the
pattern.) There is no closed form for p(n), but the theory of partitions is a
fascinating branch of mathematics in which many remarkable discoveries have
been made. For example, Ramanujan proved that p(5n + 4) E 0 (mod 5),
p(7n + 5)
s
0 (mod 7), and
p(1
In
+ 6) E 0 (mod 1
l),
by making ingenious
transformations of generating functions (see Andrews
[ll,
Chapter
lo]).
If
physicists
can
get
away with viewing
light sometimes as
a wave and some-
times as a particle,
mathematicians
should be able to
view generating
functions in two
different ways.
7.2 BASIC MANEUVERS 317
7.2 BASIC MANEUVERS
Now let’s look more closely at some of the techniques that make
power series powerful.
First a few words about terminology and notation. Our generic generat-
ing function has the form
G(z) =
go+glz+gzz’+-.
=
xg,,z”,
(7.12)
n>o
and we say that G(z), or G for short, is the generating function for the se-
quence
(m,gl,a,...),
h’
h
w ic we also call (gn). The coefficient
g,,
of zn
in G(z) is sometimes denoted
[z”]
G(z).
The sum in
(7.12)
runs over all n 3 0, but we often find it more con-
venient to extend the sum over all integers n. We can do this by simply
regarding g-1 = g-2 =
...
= 0. In such cases we might still talk about the
sequence
(90,91,92,.
. . ),
as if the g,‘s didn’t exist for negative n.
Two kinds of “closed forms” come up when we work with generating
functions. We might have a closed form for G(z), expressed in terms of z; or
we might have a closed form for
gnr
expressed in terms of n. For example, the
generating function for Fibonacci numbers has the closed form z/( 1
-
z
-
z2);
the Fibonacci numbers themselves have the closed form
(4”
-
$n)/fi.
The
context will explain what kind of closed form is meant.
Now a few words about perspective. The generating function G(z) ap-
pears to be two different entities, depending on how we view it. Sometimes
it is a function of a complex variable z, satisfying all the standard properties
proved in calculus books. And sometimes it is simply a formal power series,
with z acting as a placeholder. In the previous section, for example, we used
the second interpretation; we saw several examples in which z was substi-
tuted for some feature of a combinatorial object in a “sum” of such objects.
The coefficient of
Z”
was then the number of combinatorial objects having n
occurrences of that feature.
When we view G(z) as a function of a complex variable, its convergence
becomes an issue. We said in Chapter 2 that the infinite series
&O
gnzn
converges (absolutely) if and only if there’s a bounding constant A such that
the finite sums
t
O.SnSN
/gnznl
never exceed A, for any N. Therefore it’s easy
to see that if
tn3c
gnzn converges for some value z =
a,
it also converges for
all z with
IzI
<
1~01.
Furthermore, we must have
lim,,,
lgnzzl = 0; hence, in
the notation of Chapter 9,
gn
=
O(ll/z#)
if there is convergence at
~0.
And
conversely if
gn
= O(Mn), the series
t
nao
gnzn converges for all
IzI
< l/M.
These are the basic facts about convergence of power series.
But for our purposes convergence is usually a red herring, unless we’re
trying to study the asymptotic behavior of the coefficients. Nearly every
318 GENERATING FUNCTIONS
operation we perform on generating functions can be justified rigorously as
an operation on formal power series, and such operations are legal even when
the series don’t converge. (The relevant theory can be found, for example, in
Bell
[19],
Niven
[225],
and Henrici [151, Chapter
11.)
Furthermore, even if we throw all caution to the winds and derive formu-
Even if we remove
las without any rigorous justification, we generally can take the results of our
the
ta@
frem
Our
derivation and prove them by induction. For example, the generating
func-
mat tresses.
tion for the Fibonacci numbers converges only when
/zI
<
l/4
z
0.618, but
we didn’t need to know that when we proved the formula
F,
=
(4”
-
Gn)/&.
The latter formula, once discovered, can be verified directly, if we don’t trust
the theory of formal power series. Therefore we’ll ignore questions of conver-
gence in this chapter; it’s more a hindrance than a help.
So much for perspective. Next we look at our main tools for reshaping
generating functions-adding, shifting, changing variables, differentiating,
integrating, and multiplying. In what follows we assume that, unless stated
otherwise, F(z) and G(z) are the generating functions for the sequences (fn)
and (gn). We also assume that the f,,‘s and g,‘s are zero for negative n, since
this saves us some bickering with the limits of summation.
It’s pretty obvious what happens when we add constant multiples of
F and G together:
aF(z)
+
BG(z) =
atf,,z”
+
BE
gnzn
=
fi
n
trf,+
fig,)?.
n
(7.13)
This gives us the generating function for the sequence (af, + Bgn).
Shifting a generating function isn’t much harder. To shift G(z) right by
m places, that is, to form the generating function for the sequence (0,. . .
,O,
90,91,...
) = (gnPm) with m. leading O’s, we simply multiply by zm:
zmG(z) =
x
g,,
z”+“’
=
x
g+,,,z”,
integer m 3 0.
(7.14)
n n
This is the operation we used (twice), along with addition, to deduce the
equation (1
-
z
-
z’)F(z) =
z
on our way to finding a closed form for the
Fibonacci numbers in Chapter 6.
And to shift G(z) left m places-that is, to form the generating function
for
the
sequence
(sm, a,,+], gm+2,.
. . ) = (gn+,,,) with the first m elements
discarded- we subtract off the first m terms and then divide by
P:
G(z)-go-g,z-. . .
-g,-,zm-l
~
=
zm
t
gnPrn
=t
h+mZ
n*
(7.15)
n>m
ll>O
(We can’t extend this last sum over all n unless
go
= . . . =
gmPl
= 0.)
7.2 BASIC MANEUVERS 319
Replacing the
z
by a constant multiple is another of our tricks:
G(u)
=
t
~,(cz)~ =
xcngnz”;
(7.16)
n
n
this yields the generating function for the sequence
(c”g,).
The special case
c = -1 is particularly useful.
I fear d genera
ting-
Often we want to bring down a factor of n into the coefficient. Differen-
function dz
3.
tiation is what lets us ‘do that:
G’(z) =
gl
+2g2z+3g3z2+-
=
t(n+l)g,+,z".
(7.17)
n
Shifting this right one place gives us a form that’s sometimes more useful,
zG’(z)
=
tng,,z”
n
(7.18)
This is the generating function for the sequence (ng,). Repeated differentia-
tion would allow us to multiply
g,,
by any desired polynomial in n.
Integration, the inverse operation, lets us divide the terms by n:
J
L
G(t)dt
=
gez+
fg,z2
+ ;g2z3
+...
=
x
1
p-d.
(7.19)
0
TI>l
(Notice that the constant term is zero.) If we want the generating function
for
(g,/n)
instead of
(g+l/n),
we should first shift left one place, replacing
G(t) by (G(t)
-
gc)/t
in the integral.
Finally, here’s how we multiply generating functions together:
F(z)G(z)
=
=
(fo+f,z+f2z2+~-)(go+g1z+g2z2+-~)
(fogo) + (fog1 +f1!Ilo)z
+
(fog2
+f1g1
+f2go)z2
+
...
~(-pk&k)ZTI.
(7.20)
TL
k
As we observed in Chapter 5, this gives the generating function for the se-
quence (hn), the convolution of (fn) and (gn). The sum
hn
=
tk
fk&-k
can
also be written h, =
~~=,
fkgnpkr because fk = 0 when k < 0 and gn-k = 0
when k > n. Multiplication/convolution is a little more complicated than
the other operations, but it’s very useful-so useful that we will spend all of
Section 7.5 below looking at examples of it.
Multiplication has several special cases that are worth considering as
operations in themselves. We’ve already seen one of these: When F(z) = z”’
we get the shifting operation (7.14). In that case the sum h,, becomes the
single term gnPm, because
all
fk's
ue
0 except for
fm
=
1.
320 GENERATING FUNCTIONS
Table 320 Generating function manipulations.
aF(z)
+
K(z)
=
t(h
+
Bsn)z”
n
PG(z) =
t
n
gn-mz
,
integer m 3 0
G(~)-go-g,z-...-g,~,z~~’
zm
;;
n
gn+mz ,
integer m 3 0
n20
G(a) =
~cngnzn
n
G’(z)
=
x(n+
l)gn+l
P
n
zG’(z) =
xngnz”
n
s
L
0
G(t) dt =
x
;gn.-,
2”
lI>l
F(z)G(z)
=
t(tfxg,,)z”
+;W
=
;(;g+
n
kin
Another useful special case arises when F(z) is the familiar function
1/(1--z)
=
1+z+z2+...;
then all
fk's
(for k 3 0) are 1 and we have
the important formula
&(z)
=
@<h-k)~n
=
t(tgk)z".
(7.21)
n
k>O
n
k<n
Multiplying a generating function by
l/(
l-z) gives us the generating function
for the cumulative sums of the original sequence.
Table 320 summarizes the operations we’ve discussed so far. To use
all these manipulations effectively it helps to have a healthy repertoire of
generating functions in stock. Table 321 lists the simplest ones; we can use
those to get started and to solve quite a few problems.
Each of the generating functions in Table 321 is important enough to
be memorized. Many of them are special cases of the others, and many of
7.2 BASIC MANEUVERS 321
Hint: 1f the se-
quence consists
of binomial coefi-
cients,
its generat-
ing function usually
involves a binomial,
1+z.
Table 321 Simple sequences and their generating functions.
sequence
generating function closed form
(1
, o,o,
0,
o,o,. .
)
(0,. . .
I
O,l,O,O
,...
1)
(l,l,l,l,l,l,...)
(1,-1,1,-1,1,-l,...)
(l,O, l,O, l,O,. . .
)
(1,0,...,0,1,0,....0,1,0,
(1,43,4,5,6,...)
(1,2,4,8,16,32,...)
(1,4,6,4,1,0,0,...)
(k(;),(;),...)
(Lc,(':'),(':')
,...)
(l,c,cQ3,...)
(1,
(mm+'),
(mm+2),
("Z3),
(o,L;>;,$,...)
(OJ-;,;,-;,...)
(
11'111
)
‘2’6’24’,20””
>
x
,>o[n=Ol
Zn
fIoLn=ml
Zn
t
zn
n30
tn>Op
1”
zn
tn>O
[AnI
9
/
)
tn>O
[m\nlC
,
xn>o
(n
+
1)
zn
t
n>O
2”
=n
xn:O
(
4
)
zn
n
t..-.(
)
c
n
EnI
(":"j
zn
t
n
n
n>O
>
Loi
z
m+n
t
iz:
)
zn
n2l n
ix
(-v+’
Zn
n31
t
1%
7x20
n!
1
zm
1
1-Z
1
l+z
1
l-22
1
l-zm
1
(1
-
2)2
1
l-22
(1
+
2J4
(1 +
zy
1
(1
-
z)C
1
l-cz
1
(1
-
z)m+'
In
1
-
1-Z
ln(1 +
2)
eL
them can be derived quickly from the others by using the basic operations of
Table 320; therefore the memory work isn’t very hard.
For example, let’s consider the sequence
(1,2,3,4,
. ), whose generating
function
l/(
1
-
z)~
is often useful. This generating function appears near the
322
GENERATING FUNCTIONS
middle of Table 321, and it’s also the special case m = 1 of (1,
(","),
(mzL),
(“,‘“), ), which appears further down; it’s also the special case c = 2 of
the closely related sequence (1, c,
(‘:‘)
I
(‘12), .
).
We can derive it from the
generating function for (1 , 1 , 1 ,
1,
. .
)
by taking cumulative sums as in (7.21);
that is, by dividing 1 /(l-z) by
(1
-z).
Or we can derive it from
(1 , 1 , 1 ,
1,
. ) OK, OK,
I’m
con-
by differentiation, using (7.17).
vinced
already
The sequence (1 , 0, 1 , 0, . ) is another one whose generating function can
be obtained in many ways. We can obviously derive the formula
1,
zZn
=
l/(
1
-
z2) by substituting
z2
for
z
in the identity
t,
Z”
=
l/(
1
-
z);
we can
also apply cumulative summation to the sequence
(1, -1 ,
1,
-1, . . . ),
whose
generating function is
l/(1
$ z), getting
l/(1
+z)(l
-
z) =
l/(1
-2’).
And
there’s also a third way, which is based on a general method for extracting
the even-numbered terms
(gc
, 0,
g2,
0, g4,0, . . . ) of any given sequence: If we
add
G(-z)
to
G(+z)
we get
G(Z)+
G(-z)
=
t
gn(l
+(-1)")~" =
2x
g,[n
evenlz”;
n
n
therefore
G(z)
+
G(-z)
2
=
t
g2n
zLn
.
n
The odd-numbered terms can be extracted in a similar way,
G(z)
-
G(-z)
2
=t
g2n+1zZn+'
n
(7.22)
In the special case where
g,,
=I
1 and G(z) =
l/(
1
-z),
the generating function
for(1,0,1,0,...)is~(~(z)+~(-z))=t(&+&)=A.
Let’s try this extraction trick on the generating function for Fibonacci
numbers. We know that
I.,
F,zn =
z/(
1
-
z
-
2');
hence
t
F2nz
n
2n
=
;(j57+l+r’,)
1
(
2
+
22
-
23
-
2
+
z2
+
z3 z2
=-
2
(I
-z2)2-22
)
=
l-322+24
This generates the sequence
(Fo,
0,
F2,0,
F4,.
. .
);
hence the sequence of alter-
nate F’s,
(Fo,Fl,Fd,F6,...)
=
(0,1,3,8,...
),
has a simple generating function:
IL
F2,,zn
=
z
l-3z+z2
n
(7.24)
7.3 SOLVING RECURRENCES 323
7.3 SOLVING RECURRENCES
Now let’s focus our attention on one of the most important uses of
generating
functiorrs:
the solution of recurrence relations.
Given a sequence
(gn)
that satisfies a given recurrence, we seek a closed
form for
gn
in terms of n. A solution to this problem via generating functions
proceeds in four steps that are almost mechanical enough to be programmed
on a computer:
1
Write down a single equation that expresses
g,,
in terms of other elements
of the sequence. This equation should be valid for all integers n, assuming
that g-1 = g-2 =
...
= 0.
2 Multiply both sides of the equation by zn and sum over all
n.
This gives,
on the left, the sum
x.,
gnzn, which is the generating function G (2). The
right-hand side should be manipulated so that it becomes some other
expression involving G (2).
3
Solve the resulting equation, getting a closed form for G (2).
4
Expand G(z) into a power series and read off the coefficient of
zn;
this is
a closed form for
gn.
This method works because the single function G(z) represents the entire
sequence
(gn)
in such a way that many manipulations are possible.
Example 1: Fibonacci numbers revisited.
For example, let’s rerun the derivation of Fibonacci numbers from Chap-
ter 6. In that chapter we were feeling our way, learning a new method; now
we can be more systematic. The given recurrence is
go =
0;
91 =
1;
gn
=
%-1+%-z,
for n 3 2.
We will find a closed form for
g,,
by using the four steps above.
Step 1 tells us to write the recurrence as a “single equation” for
gn.
We
could say
9
n=
i
0,
ifn<O;
1,
if n = 1;
gn-1 -t gn-2, if n > 1;
but this is cheating. Step 1 really asks for a formula that doesn’t involve a
case-by-case construction. The single equation
gn
=
gn-l+~ln-z
works for n > 2,
a.nd
it also holds when n 6 0 (because we have
go
= 0
and
gnegative
=
0). But when n = 1 we get 1 on the left and 0 on the right.
324 GENERATING FUNCTIONS
Fortunately the problem is easy to fix, since we can add [n =
11
to the right;
this adds 1 when n =
1,
and it makes no change when n # 1. So, we have
gn
=
s-1
+a-2+[n=ll;
this is the equation called fo:r in Step 1.
Step 2 now asks us to t:ransform the equation for (g,,) into an equation
for G(z) =
t,
gnzn.
The task is not difficult:
G(z)
=
x
gnzn
=
~gnlzn+tg,~rzn+~[n=l]zn
n
=
;gnzn+l+;gnzn+2
fnz
n
n
=
G(z)
+
z’G(z)
+ z.
Step 3 is also simple in this case; we have
G(z) =
'
l-z-z2'
which of course comes as no surprise.
Step 4 is the clincher. We carried it out in Chapter 6 by having a sudden
flash of inspiration; let’s go more slowly now, so that we can get through
Step 4 safely later, when we meet problems that are more difficult. What is
b”l
z
l-z-22'
the coefficient of zn when z/( 1
-
z
-
z2) is expanded in a power series? More
generally, if we are given any rational function
P(z)
R(z) =
Qo,
where P and Q are polynomials, what is the coefficient
[z”]
R(z)?
There’s one kind of rational function whose coefficients are particularly
nice, namely
(1
-
puz)m+1
=
x
(m;n)ap"z"
n30
(7.25)
(The case
p
= 1 appears in Table 321, and we can get the general formula
shown here by substituting
pz
for z.) A finite sum of functions like (7.25),
s(z) =
(1
-
pyl,-,+,
'-
a2
al
(1
-p2Z)m2+'
+'.'+
(1
-pLZ)mL+l
'
(7.4
7.3 SOLVING RECURRENCES 325
also has nice coefficients,
+ . . . + al
P?
*
(7.27)
We will show that every rational function R(z) such that R(0) #
00
can be
expressed in the form
R(z)
=
S(z)
t
T(z),
(7.28)
where S(z) has the form (7.26) and T(z) is a polynomial. Therefore there is a
closed form for the coefficients
[z”]
R(z). Finding S(z) and T(z) is equivalent
to finding the “partial fraction expansion” of R(z).
Notice that S(z) =
00
when
z
has the values
l/p,,
. . . ,
l/pi.
Therefore
the numbers
pk
that we need to find, if we’re going to succeed in expressing
R(z) in the desired form S(z) + T(z), must be the reciprocals of the numbers
&k
where Q(ak) = 0. (Recall that R(z) =
P(z)/Q(z),
where P and Q are
polynomials; we have R(z) =
00
only if Q(z) = 0.)
Suppose Q(z) has the form
Q(z)
=
qo+q1z+~~~+q,z”‘,
where
qo
#
0 and
q,,,
# 0.
The “reflected” polynomial
QR(z)
=
qoP+
q,z"-'
+...f
q,,,
has an important relation to Q (2):
QR(4 = qo(z
-
PI
1.
. .
(2
-
P,)
w
Q(z) =
qo(l
-PIZ)...(~
-P~z)
Thus, the roots of QR are the reciprocals of the roots of Q, and vice versa.
We can therefore find the numbers
pk
we seek by factoring the reflected poly-
nomial QR(z).
For example, in the Fibonacci case we have
Q(z) = 1
-2-z’;
QR(z) =
z2-z-l.
The roots of QR ca.n be found by setting (a, b, c) = (1, -1, -1) in the quad-
ratic formula (-b
II:
da)/2a;
we find that they are
l+ds
+=2
1-d
and $ =
2
Therefore QR(z) =
(z-+)(2-$)
and Q(z) = (1 -+z)(l
-i$z).
326 GENERATING FUNCTIONS
Once we’ve found the p’s, we can proceed to find the partial fraction
expansion. It’s simplest if all the roots are distinct, so let’s consider that
special case first. We might
a.s
well state and prove the general result formally:
Rational Expansion Theorem for Distinct Roots.
If
R(z)
= P(z)/Q(z), where Q(z) = qo(l
-
plz)
. . . (1
-
pLz)
and the
numbers (PI, . . . ,
PL)
are distinct, and if P(z) is a polynomial of degree less
than
1,
then
[z”IR(z) =
a,p;+..+alp:,
-pkp(l/pk)
where
ak
=
Q,fl,Pkl
.
(7.29)
Proof: Let
al,
. , . ,
a1
be the stated constants. Formula (7.29) holds if R(z) =
P(z)/Q(z) is equal to
S(z)
=
d!-
1
-P1Z
+...+al.
1
-
PLZ
And we can prove that R(z) = S(z) by showing that the function T(z) =
R(z)
-
S(z) is not infinite as z
+
1 /ok. For this will show that the rational
Impress your par-
function T(z) is never infinite; hence T(z) must be a polynomial. We also can
ents
bY
leaving
the
show that T(z)
+
0 as z
+
co; hence T(z) must be zero.
book open at this
page.
Let ak =
l/pk.
To prove that
lim,,,,
T(z) #
oo,
it suffices to show that
lim,,.,
(z
-
cck)T(z)
= 0, because T(z) is a rational function of z. Thus we
want to show that
lim (Z
-
ak)R(Z)
=
;jzk
(Z
-
xk)s(z)
.
L’CCI,
The right-hand limit equals
l.im,,,,
ok(z-
c&)/‘(l
-
pkz) =
-ak/pk,
because
(1
-
pkz) =
-pk(z-Kk)
and
(z-c&)/(1
-
PjZ)
-+ 0 for
j
# k. The left-hand
limit is
by L’Hospital’s rule. Thus the theorem is proved.
Returning to the Fibonacci example, we have P(z) = z and Q(z) =
1
-
z
-
z2
= (1
-
@z)(l
-
$2); hence Q’(z) = -1
-
22, and
-PP(l/P)
=
-1
P
Q/(1/p)
-1 -2/p
=p+2.
According to (7.2g), the coefficient of +” in
[zn]
R(z) is therefore
@/(c$
+ 2) =
l/d;
the coefficient of
$”
is
$/($
+ 2) =
-l/\/5.
So the theorem tells us
that
F,
= (+”
-
$“)/fi,
as in (6.123).
7.3 SOLVING RECURRENCES 327
When Q(z) has repeated roots, the calculations become more difficult,
but we can beef up the proof of the theorem and prove the following more
general result:
General Expansion Theorem for Rational Generating Functions.
If
R(z)
= P(t)/Q(z), where Q(z) =
qo(1
-
~12)~'
.
..(l
-
p~z)~[
and the
numbers
(PI,.
. , pi) are distinct, and if
P(z)
is a polynomial of degree less
than
dl
+ . . . + dl, then
[z"]
R(z)
=
f,ln)p;
+
...
+
ft(n)p;
for all n 3 0,
(7.30)
where each fk(n) is a polynomial of degree
dk
-
1 with leading coefficient
(7.31)
This can be proved by induction on max(dl , . . . , dl), using the fact that
al(dl
-l)!
R(z)
-
(1py
-
. . .
-
al(dl
-
l)!
(1
-
WldL
is a rational function whose denominator polynomial is not divisible by
(1
-
pkz)dk
for
any k.
Example 2: A more-or-less random recurrence.
Now that we’ve seen some general methods, we’re ready to tackle new
problems. Let’s try to find a closed form for the recurrence
go =
g1
= 1 ;
Sn
=
gn-l+2g,~~+(-l)~,
for n 3 2.
(7.32)
It’s always a good idea to make a table of small cases first, and the recurrence
lets us do that easily:
No closed form is evident, and this sequence isn’t even listed in Sloane’s
Handbook
[270];
so we need to go through the four-step process if we want
to discover the solution.
328 GENERATING FUNCTIONS
Step 1 is easy, since we merely need to insert fudge factors to fix things
when n < 2: The equation
gn
=
C.h-1
+&h-2
+
I-l)“[n~O]
+
[n=l]
holds for all integers n. Now we can carry out Step 2:
G(z)
=
F
g,,z”
=
-
y-
gn-1zn+
2y
gn-2zn
+t(-l)v+
p
-
--
n
rr
n
n&l
n=l
N.B.: The upper
=
A(z)
+
2z2G(z) +
index on
En=,
z”
is not missing!
(Incidentally, we could also have used
(-,‘)
instead of (-1)"
[n
3
01,
thereby
getting
x.,
(-,‘)z”
= (1
+z)--’
by the binomial theorem.) Step 3 is elementary
algebra, which yields
1
+
z(1
+
2;)
G(z)
=
(1
-tz)(l
-z--
=
l+z+z2
(1 -22)(1
+
z)2
'
And that leaves us with
Ste:p
4.
The squared factor in the denominator is a bit troublesome, since we
know that repeated roots are more complicated than distinct roots; but there
it is. We have two roots,
p1
= 2 and
pz
= -1; the general expansion theorem
(7.30) tells us that
9
n=
~112~
+ (am +
c:l(-l)n
for some constant c, where
1+1/2+1/4 7
l-1+1
1
al
=
(1+1/2)2
=
9;
a2
=
l-2/(-1)
=
3
*
(The second formula for
ok
in (7.31) is easier to use than the first one when
the denominator has nice factors. We simply substitute
z
= 1 /ok everywhere
in
R(z),
except in the factor
.where
this gives zero, and divide by (dk
-
1
)!;
this
gives the coefficient of
ndk-‘l
n
pk.)
Plugging in n = 0 tells us that the value of
the remaining constant c had better be
$;
hence our answer is
gn
= $2n+
($n+$)(-l)n.
(7.33)
It doesn’t hurt to check the cases n = 1 and 2, just to be sure that we didn’t
foul up. Maybe we should even try n = 3, since this formula looks weird. But
it’s correct, all right.
Could we have discovered (7.33) by guesswork? Perhaps after tabulating
a few more values we may have observed that g,+l
z
29, when n is large.
7.3 SOLVING RECURRENCES 329
And with chutzpah and luck we might even have been able to smoke out
the constant $. But it sure is simpler and more reliable to have generating
functions as a tool.
Example 3: Mutually recursive sequences.
Sometimes we have two or more recurrences that depend on each other.
Then we can form generating functions for both of them, and solve both by
a simple extension of our four-step method.
For example, let’s return to the problem of 3 x n domino tilings that we
explored earlier
this’
chapter. If we want to know only the total number of
ways, Ll,, to cover a 3 x n rectangle with dominoes, without breaking this
number down into vertical dominoes versus horizontal dominoes, we needn’t
go into as much detail as we did before. We can merely set up the recurrences
uo
= 1 ,
Ul =o;
vo
= 0,
v,
=l;
u,
=2v,-,
fl.lnp2,
vn
=
LLl
+
vn4
)
for n 3 2.
Here
V,
is the number of ways to cover a 3 x n rectangle-minus-corner, using
(3n
-
1)/2 dominoes. These recurrences are easy to discover, if we consider
the possible domino configurations at the rectangle’s left edge, as before. Here
are the values of
U,
and V,, for small n:
nlO1234
5 6 7
\
,r
i
\
(7.34)
Let’s find closed forms, in four steps. First (Step
l),
we have
U,
= 2V,-1 +
U-2
+ [n=Ol ,
vll
=
b-1
+v,-2,
for all n. Hence (Step 2),
U(z) = ZzV(zj
+
z%l(z)+l
, V(z) =
d(z)
+
z2V(z)
Now (Step 3) we must solve two equations in two unknowns; but these are
easy, since the second equation yields V(z) =
zU(z)/(l
-
2’); we find
l-22
U(z) = --.
V(z]
=
z
l-422
+24' 1
-
422
+
24
(We had this formula for U(z) in (7.10), but with
z3
instead of z2. In that
derivation, n was the number of dominoes; now it’s the width of the rectangle.)
The denominator 1
-
4z2 +
z4
is a function of z2; this is what makes
U
I~+J
= 0 and V2, = 0, as they should be. We can take advantage of this
330 GENERATING FUNCTIONS
nice property of
t2
by retain:ing
z2
when we factor the denominator: We need
not take 1
-
4z2 +
z4
all the way to a product of four factors (1
-
pkz), since
two factors of the form (1
-
()kz’) will be enough to tell us the coefficients. In
other words if we consider the generating function
W(z) =
1
l-42+z2
=
w()+w,z+w22+-.
,
we will have V(z) = zW(z’) and U(z) = (1
-
z2)W(z2); hence Vzn+l = W,
and
U2,,
=
W,,
-W,.-
1.
We save time and energy by working with the simpler
function W(z).
The factors of 1
-4z+z1
are
(2-2-d)
and (z-2+&), and they can
also be written (1
-
(2+fi)z)
and (1
-
(2-fi)z)
because this polynomial
is its own reflection. Thus it turns out that we have
VZn+l
= wn =
3-2~6
qq2+J3)“+-(2-ti)“;
U2n =
w,
-w,_,
=
3+J3
3-d
-+2+&)?-(2-\/5)n
(2+&l”
+
(2-m”
= 3-a
3td3
(7.37)
This is the desired closed form for the number of 3 x n domino tilings.
Incidentally, we can simplify the formula for Uzn by realizing that the
second term always lies between 0 and
1.
The number
l-lz,,
is an integer, so
we have
(7.38)
In fact, the other term (2 --
&)n/(3
+
A)
is extremely small when n is
large, because 2
-
&
z
0.268. This needs to be taken into account if we
try to use formula (7.38) in numerical calculations. For example, a fairly
expensive name-brand hand
Icalculator
comes up with 413403.0005 when asked
to compute (2 +
fi)‘O/(3
-
a).
This is correct to nine significant figures;
but the true value is slightly less than 413403, not slightly greater. Therefore
it would be a mistake to
tak.e
the ceiling of 413403.0005; the correct answer,
U20
= 413403, is obtained by rounding to the nearest integer. Ceilings can
I’ve
known slippery
be hazardous.
floors too.
Example 4: A closed form for change.
When we left the problem of making change, we had just calculated the
number of ways to pay
506.
Let’s try now to count the number of ways there
are to change a dollar, or a million dollars-still using only pennies, nickels,
dimes, quarters, and halves.
7.3 SOLVING RECURRENCES 331
The generating function derived earlier is
(qz)
11
1
1 1
=
-
-
-
~
1
AZ
1
F-5
1
pz10
1
pz25
-.
'
1
-z50
this is a rational function of
z
with a denominator of degree 91. Therefore
we can decompose the denominator into 91 factors and come up with a 91-
term “closed form” for
C,,
the number of ways to give n cents in change.
But that’s too horrible to contemplate. Can’t we do better than the general
method suggests, in this particular case?
One ray of hope suggests itself immediately, when we notice that the
denominator is almost a function of z5. The trick we just used to simplify the
calculations by noting that 1
-
4z2 +
z4
is a function of
z2
can be applied to
C(z),
ifwe
replace
l/(1
-2)
by (1
+z-tz2+z3
+z4)/(1
-z5):
C(z)
-
+
-t
z2 +
1
2
23
+
z4
1 1
1
1
=
-___--
1-S
1
M-5
1
vz10
1
yz25
1
pz50
=
(1+z+z2+z3+24)c(z5),
11 1 1
1
C(Z)
=
-
.-
-
-
~
1-21-21-2~1-251-2'0'
The compressed function
c(z)
has a denominator whose degree is only 19,
so it’s much more tractable than the original. This new expression for C(z)
shows us, incidentally, that
Csn
=
Csn+’
=
C5n+2
=
Csn+3
=
C5,,+4;
and
indeed, this set of equations is obvious in retrospect: The number of ways to
leave a 53{ tip is the same as the number of ways to leave a 50# tip, because
the number of pennies is predetermined modulo 5.
Now we’re also
But
c(z)
still doesn’t have a really simple closed form based on the roots
getting compressed
reasoning.
of the denominator. The easiest way to compute its coefficients of
c(z)
is
probably to recognize that each of the denominator factors is a divisor of
1
-
2”.
Hence we can write
A(z)
--
c
(
z
) =
(1
-zlo)5
'
where A(z)
=Ao+A’z+...+A3’z3’.
(7.39)
The actual value of A(z), for the curious, is
(1
+z+...
+z~')~(1+z2+~~~+z~)(l+2~)
=
1
+2z+4z2+6z3+9z4+13z5+18z6+24z7
+
31z8
$-
39z9
+ 452" +
522"
+
57~'~
+
63~'~
+
67~'~
+
69~'~
+
69~'~
t67z"
+
63~'~
$57~'~
+52z20
+45z2'
+
39~~~
$31~~~
+
24~~~
t18~~~
+
13~~~
+
9z2'
+
6zzs
+4z29
+2z30
+z3'
.
332 GENERATING FUNCTIONS
Finally, since
l/(1
-z")~ = xkao
(k14)~'0k,
we can determine the coefficient
of
C,
=
[z”]
C(z) as follows, when n = 1 Oq +
r
and 0 6
r
< 10:
c
lOq+r
=
~Aj(k:4)[10q+r=10k+jl
=
A:(‘:“)
+ A,+Io(‘;~) + A,+zo(~;‘) + A,+~o(‘;‘) .
(7.40)
This gives ten cases, one for each value of r; but it’s a pretty good closed
form, compared with alterrratives that involve powers of complex numbers.
For example, we can
u,se
this expression to deduce the value of
C50q
=
Clog.
Then
r
= 0 and we have
c50q =
("k")
+45(q;3)+52(4;2)
+2(“3
The number of ways to change 50# is (i)
+45(t)
= 50; the number of ways
to change $1 is
($)
+45(i)
-t 52(i) = 292;
and the number of ways to change
$l,OOO,OOO
is
= 66666793333412666685000001.
Example 5: A divergent series.
Now let’s try to get a closed form for the numbers
gn
defined by
40
= 1;
9
n =
ngv1,
for
11
> 0.
After staring at this for a Sew nanoseconds we realize that
g,,
is just n!; in
Nowadayspeo-
fact, the method of summation factors described in Chapter 2 suggests this
~~~~‘e~c~~~
answer immediately. But let’s try to solve the recurrence with generating
~
functions, just to see what happens. (A powerful technique should be able to
handle easy recurrences like this, as well as others that have answers we can’t
guess so easily.)
The equation
9
n=
ngn-1
+
[n=Ol
holds for all n, and it leads to
G(z) =
xgnz”
=
~ng,-rz”+~z’.
n n
n=O
To complete Step 2, we want to express
t,
ng,
1 2” in terms of G(z), and the
basic maneuvers in Table 320 suggest that the derivative G’(z) =
t,
ngnzn
7.3 SOLVING RECURRENCES 333
is somehow involved. So we steer toward that kind of sum:
G(z)
=
l+t(n+l)g,M+’
=
1
+
t
ng, zn+l +
x
gn
zn+’
n
= 1
+z’G’(z)+zG(z).
Let’s check this equation, using the values of
g,,
for small n. Since
G = 1
+z+2z2
+
6z3
+24z4
+
...
,
G’
=
1+42
+18z2+96z3+-.,
we have
z2G’
zz
z2+4z3+18z4+96z5+.-,
zG =
z+z2
+2z3
+
6z4
+24z5
+
...
,
1 = 1.
These three lines add up to G, so we’re fine so far. Incidentally, we often find
it convenient to write ‘G’ instead of ‘G(z)‘; the extra
‘(2)’
just clutters up the
formula when we aren’t changing
z.
Step 3 is next, and it’s different from what we’ve done before because we
have a differential equation to solve. But this is a differential equation that
we can handle with the hypergeometric series techniques of Section 5.6; those
techniques aren’t too bad. (Readers who are unfamiliar with hypergeometrics
“This will be ouick.”
needn’t worrv- this will be quick.)
That’s what
the
doctor said just
First we must
before he
stuck
me
both sides:
get rid of the constant
‘l’,
so we take the derivative of
with that needle.
Come to think of it,
“hypergeometric”
sounds a lot like
G’ = @‘G’S zG
+
1)’ =
(2zG’+z’G”)+(G
+zG’)
=
z2G”+3zG’+G.
“hypodermic.”
The theory in Chapter 5 tells us to rewrite this using the 4 operator, and we
know from exercise 6.13 that
9G
= zG’,
B2G
=
z2G”
+zG’.
Therefore the desired form of the differential equation is
4G =
~9~G+224G+zG
=
z(9+1)‘G.
According to
(5.1og),
the solution with
go
= 1 is the hypergeometric series
F(l,l;;z).
334 GENERATING FUNCTIONS
Step 3 was more than we bargained for; but now that we know what the
function G is, Step 4 is easy-the hypergeometric definition (5.76) gives us
the power series expansion:
We’ve confirmed the closed
:form
we knew all along,
g,,
= n!.
Notice that the technique gave the right answer even though G(z) di-
verges for all nonzero z. The sequence n! grows so fast, the terms In!
zTll
approach
0;)
as n -+ 00,
un:less
z
= 0. This shows that formal power series
can be manipulated algebraically without worrying about convergence.
Example 6: A recurrence that goes ail the way back.
Let’s close this section by applying generating functions to a problem in
graph theory. A fun of order n is a graph on the vertices {0,
1,
. . . , n} with
2n
-
1 edges defined as follows: Vertex 0 is connected by an edge to each of
the other n vertices, and vertex k is connected by an edge to vertex k +
1,
for
1 6 k < n. Here, for example, is the fan of order 4, which has five vertices
and seven edges.
A
4
3
2
0
1
The problem of interest: How many spanning trees
f,
are in such a graph?
A spanning tree is a subgraph containing all the vertices, and containing
enough edges to make the subgraph connected yet not so many that it has
a cycle. It turns out that every spanning tree of a graph on n + 1 vertices
has exactly n edges. With fewer than n edges the subgraph wouldn’t be
connected, and with more
t:han
n it would have a cycle; graph theory books
prove this.
There are (‘“L’) ways to choose n edges from among the 2n
-
1 present
in a fan of order n, but these choices don’t always yield a spanning tree. For
instance the subgraph
/
4
3
I
2
0
1
has four edges but is not a spanning tree; it has a cycle from 0 to 4 to 3 to 0,
and it has no connection between
{l
,2} and the other vertices. We want to
count how many of the (‘“i ‘) choices actually do yield spanning trees.
336 GENERATING FUNCTIONS
This is a recurrence that “goes all the way back” from
f,-l
through all pre-
vious values, so it’s different from the other recurrences we’ve seen so far
in this chapter. We used a special method to get rid of a similar right-side
sum in Chapter 2, when we solved the quicksort recurrence (2.12); namely,
we subtracted one instance of the recurrence from another (f,+l
-
fn).
This
trick would get rid of the
t
now, as it did then; but we’ll see that generating
functions allow us to work directly with such sums. (And it’s a good thing
that they do, because we will be seeing much more complicated recurrences
before long.)
Step 1 is finished; Step
:2
is where we need to do a new thing:
F(z)
=
tf,zn
=
tf,,zn+tfkzn[k<n]+t(n>O]zn
n n
kn
n
=
zF(z)
+
~fkZk~[n>k]Znpk
+
ez
k n
= zF(z) + F(z)
1
zm
+
&
m>O
=
zF(z)
+
F(z)
&
+
it-.
1-z
The key trick here was to change zn to z
k
z
n-k;
this made it possible to express
the value of the double sum in terms of F(z), as required in Step 2.
Now Step 3 is simple algebra, and we find
F(z)
=
1
-3zf22
*
Those of us with a zest for memorization will recognize this as the generating
function (7.24) for the even-numbered Fibonacci numbers. So, we needn’t go
through Step 4; we have found a somewhat surprising answer to the
spans-
of-fans problem:
fn
=
F2n
1
for n 3 0.
(
7.42)
7.4
SPECIAL GENERATING FUNCTIONS
Step 4 of the four-step procedure becomes much easier if we know
the coefficients of lots of
diff’erent
power series. The expansions in Table 321
are quite useful, as far as they go, but many other types of closed forms are
possible. Therefore we ought to supplement that table with another one,
which lists power series that correspond to the “special numbers” considered
in Chapter 6.
338 GENERATING FUNCTIONS
Table 337 is the database we need. The identities in this table are not
difficult to prove, so we needn’t dwell on them; this table is primarily for
reference when we meet a new problem. But there’s a nice proof of the first
formula, (7.43), that deserves mention: We start with the identity
1
=
t
(xy)zn
(1
-2)x+’
n
and differentiate it with respect to x. On the left, (1
-
z)-~-’
is equal to
elx+l~ln~llll-rll
so d/dx contributes a factor of
ln(l/(
1
-
2)). On the right,
the numerator
df
(“‘-,“)
is (x +n) . . . (x + 1
),
and d/dx splits this into n terms
whose sum is equivalent to :multiplying
(“‘,“)
by
1 1
-+...+-
=
H
x+n
x+1
x+Tl
-
H, .
Replacing x by m gives (7.43). Notice that H,+n
-
H, is meaningful even
when x is not an integer.
By the way, this method of differentiating a complicated product
-
leav-
ing it as a product-is usually better than expressing the derivative as a sum.
For example the right side of
$(i
x+n)“...(x+l)‘)
=
(x+n)n...(x+l)’
(
*+...+A
>
would be a lot messier written out as a sum.
The general identities in Table 337 include many important special cases.
For example, (7.43) simplifies to the generating function for
H,
when m = 0:
&ln&
=
tH,z”.
(7.57)
n
This equation can also be derived in other ways; for example, we can take the
power series for
ln(l/(
1
-
z))
and divide it by 1
-
z to get cumulative sums.
Identities (7.51) and (7.52) involve the respective ratios
{,~,}/(“~‘)
and
[,“‘J
/(“c’),
which have the undefined form O/O when n 3 m. However,
there is a way to give them a proper meaning using the Stirling polynomials
of (6.45), because we have
{mmn}/(m~l)
.=
(-l)n+‘n!mo,(n-m);
[m~n]/(m~l)
=
n!mo,(m).
(7.59)
7.4 SPECIAL GENERATING FUNCTIONS 339
Thus, for example, the case
n
= 1 of (7.51) should not be regarded as the
power series
,&,O(zn/n!){,
l,}/(z),
but rather as
z
ln(1 +
2)
=
-t(-z)“oll(n-l)
= 1
+~z-~zz+...
.
II20
Identities
(7.53), (7.551, (7.54),
and (7.56) are “double generating func-
tions” or “super generating functions” because they have the form G (w,
z)
=
t,,,
Sm,n
~“‘2~.
The coefficient of wm is a generating function in the vari-
able
z;
the coefficient of 2” is a generating function in the variable w.
7.5 CONVOLUTIONS
I
always thought
convolution was
what happens to
my brain when
1
try to do a proof.
The convolution of two given sequences
(fo,
fl
, . . ) =
(f,,)
and
(SOlSl,. .
.)
=
(gn)
is
the
sequence
(f0g0,
fog1 +
flg0,
. .
.)
=
(xkfkgn
k).
We have observed in Sections 5.4 and 7.2 that convolution of sequences cor-
responds to multiplication of their generating functions. This fact makes it
easy to evaluate many sums that would otherwise be difficult to handle.
Example 1: A Fibonacci convolution.
For example, let’s try to evaluate
~~=,
FkFn~-k
in closed form. This is
the convolution of (F,) with itself, so the sum must be the coefficient of 2”
in
F(z)',
where
F(z)
is the generating function for (F,). All we have to do is
figure out the value of this coefficient.
The generating function
F(z)
is
z/(
1
-z-z’),
a quotient of polynomials; so
the general expansion theorem for rational functions tells us that the answer
can be obtained from a partial fraction representation. We can use the general
expansion theorem (7.30) and grind away; or we can use the fact that
Instead of expressing the answer in terms of
C$
and
$i,
let’s try for a closed
form in terms of Fibonacci numbers. Recalling that
Q
+ $ =
1,
we have
$“+$” =
[z”l
j&
+
&J
(
2-
(Q+$)z
=
Lz"'
(1
-
($z)(
1
_
qjz)
=
VI
2-z
l-Z-22
=
2F,+,
-F,.
340 GENERATING FUNCTIONS
F(z)'
=
i
x
(n + 1 )(;!F,+r
-F,,)2-
;
x
F,+I
.zn
,
7x30
Tl30
and we have the answer we seek:
if
FkFn-k
=
2nF,+ 1
--(n+l)F,
5
k=O
(7.60)
For example, when n = 3 this formula gives
F,JF~
+
FlF2
+
FzFl
+
F~F,J
=
0+ 1 +1 +0
=2
on the left and
(6F4
-4F3)/5
q = (18-8)/5
=2
on the right.
Example 2: Harmonic convolutions.
The efficiency of a certain computer method called “samplesort” depends
on the value of the sum
integers
m,n
3 0.
Exercise 5.58 obtains the value of this sum by a somewhat intricate double
induction, using summation factors. It’s much easier to realize that
Tm,n
is
just the nth term in the convolution of
((i),
(A),
(i),
. . .) with (0, $,
i,
. .
.).
Both sequences have simple generating functions in Table 321:
zm
zn =
--.
(1
-z)nl+l
'
xg
=
ln&.
n>O
Therefore, by (7.43),
m
T
1 1
1
m,n
=
[z”l
(,
_“,,,,l
In
1-z
=
'z"-"'
(1
-Z)m+l
I
n
-
=
U-k’-LJ
nnm
.
(
>
In fact, there are many more sums that boil down to this same sort of
convolution, because we have
1
1
(1
-z)'+s+2
In-
for all
T
and s. Equating coefficients of 2” gives the general identity
;
(‘:“)
(s+nl;k)IH,+d-&)
=
(r+s+n,L+l)(H.+s+n+~
-H,+,+I)
(7.61)
7.5 CONVOLUTIONS 341
Beta
use it’s so
harmonic.
This seems almost too good to be true. But it checks, at least when n = 2:
=
(T+;+3)(r+:+3+r+j+2)
Special cases like
s
.=
0 are as remarkable as the general case.
And there’s more. We can use the convolution identity
&
(‘:“)(“fn”*k)
=
(r+y+‘)
to transpose H, to
t,he
other side, since H, is independent of k:
;
(r;k)(s;:;k)Hr+~
=
(I+sfn+‘)(Hr+rni~
-H,+,+,
+H,).
There’s still more: If
r
and
s
are nonnegative integers
1
and m, we can replace
(‘+kk)
by (‘I”) and (“‘,“i”) by (‘“‘,“Pk); then we can change k to k-
1
and
n to n
-
m
-
1,
gett,ing
integers
1,
m, n 3 0.
(7.63)
Even the special case
1=
m = 0 of this identity was difficult for us to handle
in Chapter
2!
(See (2.36).) We’ve come a long way.
Example 3: Convolutions of convolutions.
If we form the convolution of (fn) and (g,,), then convolve this with a
third sequence (h,), we get a sequence whose nth term is
The generating function of this three-fold convolution is, of course, the three-
fold product F(z) G(z) H(z). In a similar way, the m-fold convolution of a
j+k+l=n
sequence (
gn)
with itself has nth term equal to
x
gk, gkl
...
gk,
kl
+kr+...+k,=n
and its generating function is
Go.
342 GENERATING FUNCTIONS
We can apply these observations to the spans-of-fans problem considered
earlier (Example 6 in Section 7.3). It turns out that there’s another way to
compute
f,,
the number of spanning trees of an n-fan, based on the config-
urations of tree edges between the vertices
{1,2,.
. . , n}: The edge between
vertex k and vertex k + 1 may or may not be selected for the subtree; and
each of the ways to select these edges connects up certain blocks of adjacent
Concrete blocks.
vertices. For example, when n = 10 we might connect vertices
{1,2},
{3},
{4,5,6,7}, and {8,9,10}:
1
10
9
8
I
7
6
5
4
03
I
2
0.
1
How many spanning trees can we make, by adding additional edges to ver-
tex O? We need to connect 0 to each of the four blocks; and there are two
ways to join 0 with
{1,2},
one way to join it with
{3},
four ways with {4,5,6,7},
and three ways with {S, 9,
lo},
or 2 9 1
.4.3
= 24 ways altogether. Summing
over all possible ways to make blocks gives us the following expression for the
total number of spanning trees:
fn=E
x
k,kz...k,.
(7.64)
m>O
k,
+kz+...+k,=n
kl
,kJ,...,k,>O
Forexample,
f4
=4+3~1+2~2+1~3+2~1~1+1~2~1+1~1~2+1~1~1~1
=21.
This is the sum of m-fold convolutions of the sequence (0,
1,2,3,.
. .
),
for
m=l,
2,3,
. . . . hence the generating function for (fn) is
F(z)
= G(z)+
G(z)'+
Go
+...
=
,';',21,)
where G(z) is the generating function for (0,
1,2,3,.
.
.),
namely
z/(1
-
2)'.
Consequently we have
F(z) =
(,_;2+
=
z
l-32+22'
as before. This approach to
(f,,)
is more symmetrical and appealing than the
complicated recurrence we had earlier.
7.5 CONVOLUTIONS 343
Example 4: A convoluted recurrence.
Our next example is especially important; in fact, it’s the “classic exam-
ple” of why generating functions are useful in the solution of recurrences.
Suppose we have n + 1 variables
x0,
x1,
. . . , x, whose product is to be
computed by doing n multiplications. How many ways
C,
are there to insert
parentheses into the product
xc
‘x1
. . .
:x,
so that the order of multiplication is
completely specified? For example, when n = 2 there are two ways, xc.
(xl
.x2
)
and
(x0.x,
) .
x2.
And when n = 3 there are five ways,
Thus
Cl
= 2,
C3
= 5; we also have
Cl
= 1 and
CO
= 1.
Let’s use the four-step procedure of Section 7.3. What is a recurrence
for the C’s? The key observation is that there’s exactly one
.
operation
outside all of the parentheses, when n > 0; this is the final multiplication
that ties everything together. If this
.
occurs between
Xk
and
xk+l
, there
are
Ck
ways to
full,y
parenthesize xc.. . . .
Xk,
and there are C,-
k
1 ways to
fully parenthesize
Xk+l
. . . . x,; hence
c,
=
CoC,-l+C,C,~2+~~'+C,~,C~,
ifn>O.
By now we recognize this expression as a convolution, and we know how to
patch the formula so that it holds for all integers n:
cn
=
xCkCn-l-k
+
[n=o].
(7.65)
k
Step 1 is now complete. Step 2 tells us to multiply by
Z”
and sum:
C(z)
=
t
c,zn
n
=
x
ckcn-,
-kZn
+
t
Zn
k.n
n=O
=
x
ckZkx
cn-,-kZn-k
+
1
k
n
=
c(z)~zc(z)+
1.
The authors jest.
Lo and behold, the convolution has become a product, in the generating-
function world. Life is full of surprises.
344 GENERATING FUNCTIONS
Step 3 is also easy. We solve for
C(z)
by the quadratic formula:
C(z)
=
1*di-=G
22
But should we choose the + isign or the
-
sign? Both choices yield a function
that satisfies C(z) =
K(z)’
-1-
1,
but only one of the choices is suitable for our
problem. We might choose the + sign on the grounds that positive thinking
is best; but we soon discover that this choice gives C(0) = 00, contrary to
the facts. (The correct function C(z) is supposed to have C(0) =
Cc
= 1.)
Therefore we conclude that
1-Jl-42
C(z)
=
2z
*
Finally, Step 4. What is
[zn]
C(z)? The binomial theorem tells us that
k>O
(‘f)
(-4zjk
= 1 +
g
&
(rl/Y)
(-4z)k
;
,
hence, using (5.37),
=
t
(--‘/‘>~
=
x
(;)A$
nao
ll)O
The number of ways to parenthesize, C,, is (‘,“)
&.
We anticipated this result in Chapter 5, when we introduced the sequence So
the
convo-
of Catalannumbers
(1,1,2,5,14,.
. . ) = (C,). This sequence arises in dozens luted
recurrence
of problems that seem at first to be unrelated to each other
[41],
because
has led us to an
many situations have a recursive structure that corresponds to the convolution
oft-recurring
con-
volution.
recurrence (7.65).
For example, let’s consider the following problem: How many sequences
(al,a2..
. , al,,) of
+1's
and
-1's
have the property that
al
+ a2
+.
. . +
azn =
0
and have all their partial sums
al,
al
+a2,
. . . .
al
+a2+...+aZn
nonnegative? There must be n occurrences of
fl
and n occurrences of -1.
We can represent this problem graphically by plotting the sequence of partial
7.5 CONVOLUTIONS 345
sums s, = XL=,
ak
as a function of n: The five solutions for n = 3 are
These are “mounta.in ranges” of width
2n
that can be drawn with line seg-
ments of the forms /and
\.
It turns out that there are exactly
C,
ways to
do this, and the sequences can be related to the parenthesis problem in the
following way: Put an extra pair of parentheses around the entire formula, so
that there are n pairs of parentheses corresponding to the n multiplications.
Now replace each
.
by +1 and each
)
by -1 and erase everything else.
For example, the formula x0.
((xl
.x1).
(xs .x4)) corresponds to the sequence
(+l,+l,-l,+l,+l,-1,-1,-l) by this rule. The five ways to parenthesize
x0 .x1
.x2.
x3 correspond to the five mountain ranges for n = 3 shown above.
Moreover, a slight reformulation of our sequence-counting problem leads
to a surprisingly simple combinatorial solution that avoids the use of gener-
ating functions: How many sequences
(ao,
al,
al,.
. . , azn) of
+1's
and -1's
have the property that
a0 + al + a2 + . . . + azn = 1 ,
when all the partial sums
a0,
a0
+
al,
a0+al
+a2,
. . . .
a0 +
al
+ . . + azn
are required to be positive? Clearly these are just the sequences of the pre-
vious problem, with the additional element
a0
=
+l
placed in front. But
the sequences in
th.e
new problem can be enumerated by a simple counting
argument, using a remarkable fact discovered by George Raney
[243]
in 1959:
If(x,,xz,...
, x,) is any sequence of integers whose sum is
fl
, exactly one of
the cyclic shifts
(x1,x2,...
,xrn),
(XZ!...,&n,Xl),
'..,
(Xtn,Xl,...,&l-1)
has all of its partial sums positive.
For example, consider the sequence
(3,
-5,2,
-2,3,0).
Its cyclic shifts are
(3,
-5,2,
-2,310)
(-A&O,&-5,4
(-5,2,
-2,3,0,3)
(3,0,3,
-5,2,
-2) J
(2,
-2,3,0,3,
-5) (0,3,
-5,2,
-2,3)
and only the one that’s checked has entirely positive partial sums.
346 GENERATING FUNCTIONS
Raney’s lemma can be proved by a simple geometric argument. Let’s
extend the sequence periodically to get an infinite sequence
thus we let X,+k = xk
for
a.11 k 3 0. If we now plot the partial sums s, =
x1
+
...
+
x,
as a function
Iof
n, the graph of s, has an “average slope” of
l/m, because
s,+,,
=
s,,
+
I.
For example, the graph corresponding to our
example sequence (3,
-5,2,
--2,3,0,3,
-5,2,.
. . ) begins as follows:
.
.
.
ss
The entire graph can be comained between two lines of slope 1
/m,
as shown;
we have m = 6 in the illustration. In general these bounding lines touch the
graph just once in each cycle of m points, since lines of slope l/m hit points
with integer coordinates only once per m units. The unique lower point of
intersection is the only place in the cycle from which all partial sums will
be positive, because every other point on the curve has an intersection point
within m units to its right.
With Raney’s lemma we can easily enumerate the sequences
(ao,
. . . , aln)
of +1’s and -1’s whose partial sums are entirely positive and whose total
sum is
+l
There are (‘“,“)
sequences with n occurrences of -1 and n + 1
occurrences of
+l,
and Raney’s lemma tells us that exactly
1/(2n
+ 1) of
these sequences have all partial sums positive. (List all N = (‘“,“) of these
sequences and all 2n + 1 of
t:heir
cyclic shifts, in an N x (2n + 1) array. Each
row contains exactly one solution. Each solution appears exactly once in each
column. So there are
N/(2ni-1)
distinct solutions in the array, each appearing
(2n + 1) times.) The total number of sequences with positive partial sums is
Ah, if stock prices
would only continue
to rise like this.
(Attention, com-
puter scientists:
The partial sums
in this problem
represent the stack
size as a function of
time, when a prod-
uct of n + 1 factors
is evaluated, be-
cause each “push”
operation changes
the size by +1 and
each multiplication
changes it by -1 .)
Example 5: A recurrence with m-fold convolution.
We can generalize the problem just considered by looking at sequences
(a0,. . .
,
amn)
of +1’s and (1
-
m)‘s whose partial sums are all positive and
7.5 CONVOLUTIONS 347
(Attention, com-
puter scientists:
The stack interpre-
tation now applies
with respect to an
m-ary operation,
instead of the bi-
nary multiplication
considered earlier.)
whose total sum is
+l
. Such sequences can be called m-Raney sequences. If
there are k occurrences of (1
-
m) and mn + 1
-
k occurrences of
+l
, we have
k(l-m)+(mn+l-k) = 1,
hence k = n. There are
(“t+‘)
sequences with n occurrences of (1
-
m) and
mn + 1
-
n occurrences of
+l,
and Raney’s lemma tells us that the number
of such sequences with all partial sums positive is exactly
mn+l
(
>
I
mn
(
>
1
-- =
n mn+l n
(m-l)n+l’
(7.66)
So this is the number of m-Raney sequences. Let’s call this a Fuss-Catalan
number
Cim,“‘,
because the sequence
(&“‘)
was first investigated by N.I.
Fuss
[log]
in 1791 (many years before Catalan himself got into the act). The
ordinary Catalan numbers are
C,
=
Cr’.
Now that we
k:now
the answer, (7.66), let’s play “Jeopardy” and figure
out a question that leads to it. In the case m = 2 the question was: “What
numbers
C,
satisfy the recurrence
C,
=
xk
CkCnPiPk + (n = O]?” We will
try to find a similar question (a similar recurrence) in the general case.
The trivial sequence (+1) of length 1 is clearly an m-Raney sequence. If
we put the number (1 -m) at the right of any m sequences that are m-Raney,
we get an m-Raney sequence; the partial sums stay positive as they increase
to +2, then +3, . . . ,
fm,
and
fl
. Conversely, we can show that all m-Raney
sequences (ae, . . . , ~a,,) arise in this way, if n > 0: The last term a,,,,, must
be (1
-
m). The partial sums
sj
=
a0
+.
. +
aj-
1 are positive for 1 < j 6 mn,
and
s,,
= m because
s,,
+ a,,,,, = 1. Let kl be the largest index 6 mn such
that
Sk,
= 1; let
k2
be largest such that
skz
= 2; and so on. Thus
ski
= j
and
sk
> j, for ki
cc
k 6 mn and 1 < j 6 m. It follows that
k,
= mn, and
we can verify without difficulty that each of the subsequences (ae, . . . , ok,
-I),
(ok,, . . . , okJPi), . . . , (ok,,-, , . . . ,
ok,,,
-1) is an m-Raney sequence. We must
have kl = mnl + 1,
k2
-
kl = mn2 + 1, . . . ,
k,
-
k,_l
= mn, +
1,
for
some nonnegative integers
nl,
n2, . . . , n,.
Therefore (“‘t’-‘)
&
is the answer to the following two interesting ques-
tions: “What are the numbers Cim’ defined by the recurrence
for all integers n?”
“If G(z) is a power series that satisfies
G(z)
=
zG(z)"
+ 1, (7.68)
what is
[z”]
G(z)?”
348 GENERATING FUNCTIONS
Notice that these are not easy questions. In the ordinary Catalan case
(m = 2), we solved (7.68) for G(z) and its coefficients by using the quadratic
formula and the binomial theorem; but when m = 3, none of the standard
techniques gives any clue about how to solve the cubic equation G = zG3 + 1.
So it has turned out to be easier to answer this question before asking it.
Now, however, we know enough to ask even harder questions and deduce
their answers. How about this one: “What is
[z”]
G(z)‘, if
1
is a positive
integer and if G(z) is the power series defined by (7.68)?” The argument we
just gave can be used to show that [PI G(z)’ is the number of sequences of
length mn +
1
with the following three properties:
.
Each element is either
$-1
or (1
-
m).
.
The partial sums are all positive.
.
The total sum is
1.
For we get all such sequences in a unique way by putting together
1
sequences
that have the m-Raney property. The number of ways to do this is
t
c’m’c’m’
n,
n*
t..
CL:)
=
[znl
G(z)'.
n,
+nr
t...+n,=n
Raney proved a generalization of his lemma that tells us how to count
such sequences: If
(XI,
x2,. . . ,
x,)
is any sequence of integers with
xi
6
1
for
all j, and with
x1
+ x2 + . . .
-1
x,
=
1
> 0,
th
en
exactly
1
of the cyclic shifts
(x1,x2,..
.,xm), (X2,...,Xm,Xl), .
..1
(%il,Xl,... ,xTn 1
)
have all positive partial sums.
For example, we can check this statement on the sequence
(-2,1,
-l,O,
l,l,-l,l,l,l).
The cyclic shifts are
(-2,1,-l,O,l,l,-l,l,l,l)
(1,~l,l,l,l,-2,1,-l,O,l)
(l,-l,O,l,l,-l,l,l,l,--2)
(-l,l,l,l,-2,1,-l,O,l,l)
(-l,O,l,l,-l,l,l,l,-2,l)
(l,l,l,-2,1,-1,0,1,1,-l)
J
(O,l,l,-1,1,1,1,-2,1,--l)
(l,l,-2,1,-l,O,l,l,-1,l)
(1,1,-~,1,1,1,-2,1,~1,0)
J
(l,-2,1,-l,O,l,l,-l,l,l)
and only the two examples marked
‘J’
have all partial sums positive. This
generalized lemma is proved in exercise 13.
A sequence of
+1's
and (1
-
m)‘s that has length mn+
1
and total sum
1
must have exactly n occurrences of (1
-
m). The generalized lemma tells
us that
L/(mn
+
1)
of these
(,
‘“‘t+‘)
sequences have all partial sums positive;
7.5 CONVOLUTIONS 349
hence our tough question has a surprisingly simple answer:
[znl
G(z)’
=
(“I+‘)
$1
for all integers
1
> 0.
Readers who haven’t forgotten Chapter 5 might well be experiencing
dkjjh
vu:
“That formula looks familiar; haven’t we seen it before?” Yes, indeed;
equation (5.60) says that
[z”]B,(z)’
=
-Jr
&.
(
)
Therefore the generating function G(z) in (7.68) must actually be the gener-
alized binomial series
‘B,(z).
Sure enough, equation (5.59) says
cBm(z)‘-m
-
Tim(z)-”
= 2)
which is the same as
T3B(z)-l = zB,(z)"
Let’s switch to the notation of Chapter 5, now that we know we’re dealing
with generalized binomials. Chapter 5 stated a bunch of identities without
proof. We have now closed part of the gap by proving that the power series
IBt
(z) defined by
TQ(z)
=
x
y
&
n
(
1
has the remarkable property that
%(z)’
=
x
(yr)$&,
n
whenever t and
T
;Ire
positive integers.
Can we extend these results to arbitrary values oft and
I-?
Yes; because
the coefficients (t:T’)
&
are polynomials in t and
T.
The general rth power
defined by
‘B,(z)’ = e
rln’Bt(z)
-
-9
rln93t(z))n
ll20
n!
=
t
$
(-
2
(I-y)nl)‘,
ll>O
llI>l
has coefficients that are polynomials in t and r; and those polynomials are
equal to
(tnn+‘)
&;
for infinitely many values oft and r. So the two sequences
of polynomials must be identically equal.
350 GENERATING FUNCTIONS
Chapter 5 also mentions the generalized exponential series
which is said in (5.60) to hzve an equally remarkable property:
[z”]
Et(=)’
=
etn
+-,w
We can prove this as a limiting case of the formulas for
‘BBt
(z), because it is
not difficult to show that
7.6
EXPONEN’I’IAL GF’S
Sometimes a sequence
(gn)
has a generating function whose proper-
ties are quite complicated, while the related sequence
(g,/n!)
has a generating
function that’s quite simple. In such cases we naturally prefer to work with
(gJn!)
and then multiply by n! at the end. This trick works sufficiently
often that we have a special name for it: We call the power series
(7.71)
the exponential generating function or
‘<egfr’
of the sequence
(go,
gl,
g2,
. . .
).
This name arises because the exponential function
ez
is the egf of (1 , 1 , 1, . , .
).
Many of the generating functions in Table 337 are actually egf’s. For
example, equation (7.50) says that (In
&)m/m!
is the egf for the sequence
([:I,
[:I,
[:]d.
Th
e
ordinary generating function for this sequence is
much more complicated (and also divergent).
Exponential generating
:functions
have their own basic maneuvers, analo-
gous to the operations we learned in Section 7.2. For example, if we multiply
the egf of
(gn)
by z, we get
t
&.n+l
n
Sn-
=
iYE
x
zn
w-1
-
;
n>O
n!
n>l
G-l
j&y
=
n>O
n!
this is the egf of (0, go,Zgl, . . .) =
(ng,-1).
Differentiating the egf of (go, 91,
g2,
. . . ) with respect to z gives
Are we having
fun yet?
(7.72)
7.6 EXPONENTIAL GENERATING FUNCTIONS 351
this is the egf of
(g-1,
g2,. . .
).
Thus differentiation on egf’s corresponds to the
left-shift operation (G(z) ~ go)/z on ordinary gf’s. (We used this left-shift
property of egf’s when we studied hypergeometric series, (5.106).) Integration
of an egf gives
g,,;dt
=
(7.73)
this is a right shift, the egf of (0,
go,
91). .
.).
The most interesting operation on egf’s, as on ordinary gf’s, is multipli-
cation. If
i(z)
and G(z) are egf’s for (f,,) and (gn), then
i(z)G(z)
= A(z) is
the egf for a sequence (hn) called the binomial convolution of (f,,) and (g,,):
Binomial coefficients appear here because (z) = n!/k! (n ~ k)!, hence
in other words,
(h,/n!)
is the ordinary convolution of
(f,,/n!)
and (g,,/n!).
Binomial convolutions occur frequently in applications. For example, we
defined the Bernoulli numbers in (6.79) by the implicit recurrence
Bi
=
[m=O],
for all m 3 0;
this can be rewritten as a binomial convolution, if we substitute n for m + 1
and add the term
ES,
to both sides:
Bk
=
B,+[n=l],
for all n 3 0.
We can now relate this recurrence to power series (as promised in Chapter 6)
by introducing the egf for Bernoulli numbers, B(z) = EnSo B,,z’/n!. The
left-hand side of (7.75) is the binomial convolution of (B,,) with the constant
sequence (1 , 1 ,
1,
.
);
hence the egf of the left-hand side is
B(
z)e’. The egf
of the right-hand side is
Ena
(B, +
[n=l])z”/n!
= B(z) + z. Therefore we
must have B(z) = z/(e’ ~ 1); we have proved equation (6.81), which appears
also in Table 337
a:s
equation (7.44).
352 GENERATING FUNCTIONS
Now let’s look again at a sum that has been popping up frequently in
this book,
S,(n) =
Om
+ 1
m
+ 2”’
+.
. . + (n
-
1)” =
x
km.
O<k<n
This time we will try to analyze the problem with generating functions, in
hopes that it will suddenly become simpler. We will consider n to be fixed
and m variable; thus our goal is to understand the coefficients of the power
series
S(z)
= S0(n)+Sl(n)z+S2(n)z2+~~~ =
x
Sm(n)zm.
ma0
We know that the generating function for (1, k, k2, . . . ) is
1
-
=
1 -kz
t
kmzm,
m>O
hence
S(z) =
x
t
kmzfn
=
t
1
ma0 O<k<n O<k<n
-
kz
by interchanging the order of summation. We can put this sum in closed
form,
(7.76)
but we know nothing about expanding such a closed form in powers of z.
Exponential generating functions come to the rescue. The egf of our
sequence (Sc(n),Sr(n),Sz(n),...) is
S(z,n) = So(n)
+Sl(n)
h
+Sz(n)
g
f...
=
x
S,(n)
2.
m30
To get these coefficients
S,(n)
we can use the egf for (1, k, k2,. . .
),
namely
$2 =
t
ma0
km$,
and we have
S(z,n) =
x
x
km
2
=
x
ekz.
m>O O$k<n O$k<n
7.6 EXPONENTIAL GENERATING FUNCTIONS 353
And the latter
sumI
is a geometric progression, so there’s a closed form
S(z,n) =
$+.
(7.77)
All we need to do is figure out the coefficients of this relatively simple function,
and we’ll know
S,i:n),
because S,(n) = m! [z”‘]S(z,n).
Here’s where 13ernoulli numbers come into the picture. We observed a
moment ago that
t.he
egf for Bernoulli numbers is
hence we can write
enz-1
S(z) = B(z) --
z
=
Bo~.+B,~+Bz~+...)(n~+n2~+n3~+-..)
(
The sum S,(n) is m! times the coefficient of
z”’
in this product. For example,
So(n)
=
O!
(h3&)
S(n)
=
1!
(
nL
n
Elom+Blm
>
n;
= .!n2-d-n;
f%(n)
=
.2!
Bo$
+
B1
&
+
B2
&)
=
(
in3-tn2+in.
. . . . * .
We have therefore derived the formula
0,
= Sz(n) =
$n(n
-
i)(n
-
1) for
the umpteenth time, and this was the simplest derivation of all: In a few lines
we have found the general behavior of S,(n) for all m.
The general fo:rmula can be written
%-l(n)
=
&EL,(n)
-
B,(O))
,
where B,(x) is the Bernoulli polynomial defined by
(7.78)
B,(x) =
t
(;)BkX-‘.
k
(7.79)
Here’s why: The Bernoulli polynomial is the binomial convolution of the
sequence
(Bo,
B1,
B;r,
. . . ) with (1,
x,x2,.
. . ); hence the exponential generating
354 GENERATING FUNCTIONS
function for (Be(x),
BI
(x),
BJ
(x), . . .) is the product of their egf’s,
@2,x)
=
x
B,,,(x)2
=
-?.-
x
P$
zexz
ez-
1
m>O
=I-.
.
eL
-
1
(7.80)
In>0
Equation (7.78) follows because the egf for (0, So(n),
25
(n), . . .
)
is, by (7.77),
e
nz
-
1
z- = B(z,n)
-
B(z,O)
ez
-
1
Let’s turn now to another problem for which egf’s are just the thing:
How many spanning trees are possible in the complete graph on n vertices
{1,2,...
, n}? Let’s call this number t,,. The complete graph has
$(n
-
1)
edges, one edge joining
each
pair of distinct vertices; so we’re essentially
looking for the total number of ways to connect up n given things by drawing
n
-
1 lines between them.
We have tl =
t2
= 1. Also
t3
= 3, because a complete graph on three
vertices is a fan of order 2; we know that
f2
= 3. And there are sixteen
spanning trees when n = 4:
I/IL-I
Ia:
cz
(7.81)
Hence
t4
= 16.
Our experience with the analogous problem for fans suggests that the best
way to tackle this problem is to single out one vertex, and to look at the blocks
or components that the spanning tree joins together when we ignore all edges
that touch the special vertex. If the non-special vertices form
m
components
of sizes kl , kz,
, . . ,
k,,
then we can connect them to the special vertex in
klk2..
.
k,
ways. For example, in the case n = 4, we can consider the lower
left vertex to be special. The top row of (7.81) shows
3t3
cases where the other
three vertices are joined among themselves in
t3
ways and then connected to
the lower left in 3 ways. The bottom row shows 2.1 x tztl x (i) solutions where
the other three vertices are divided into components of sizes 2 and 1 in (i)
ways; there’s also the case
k<
where the other three vertices are completely
unconnected among themselves.
This line of reasoning leads to the recurrence
n-l
kl,kz,...,k,
klk2...k,tk,tkz
.
..tk.
k,+kz+...+k,=n-1
7.6 EXPONENTIAL GENERATING FUNCTIONS 355
for all n > 1.
Here”s
why: There are
(k,
,:,,‘,,k_) ways to assign n- 1 elements
to a sequence of
TTL
components of respective sizes
kl,
k2, . . . , k,; there are
tk, tk1 . . .
tk,
ways to connect up those individual components with spanning
trees; there are
kr
k.2
. . .
k,
ways to connect vertex n to those components; and
we divide by m! because we want to disregard the order of the components.
For example, when n = 4 the recurrence says that
t4
=
3t3
+ ;((,32)2W2 + (23,)2tzt,) +
;((,
;
,)tf)
=
3t3
+ 6tzt, +
t;.
3 I
The recurrence for
t,
looks formidable at first, possibly even frightening;
but it really isn’t bad, only convoluted. We can define
u
n
= nt,
and then everything simplifies considerably:
%I
IL
1
uk,
ukj
uk
--
n!
=
m>O
m!
t
--
m
k,!
k2!
“’
k,!
ifn>l.
(7.82)
kl+kJ+...+k,=n-1
The inner sum is the coefficient of
z+’
.m the egf 0 (z) , raised to the mth
power; and we obtain the correct formula also when n =
1,
if we add in the
term fi(z)O that corresponds to the case m = 0. So
WI
-
=
[P’]
t
;
ti(p
n!
= [z”-‘]
,w
=
[zn]
,,w
In>0
.
for all n > 0, and we have the equation
(7.83)
Progress! Equation (7,83) is almost like
E(z) = erEcri,
which defines the generalized exponential series E(z) = El (z) in (5.59) and
(7.70); indeed, we have
cl(z)
=
z&(z)
So we can read off the answer to our problem:
t,
=
X
=
z
[zn]
Cl(z) = (n-l)! [z”~‘] E(z) =
nnp2
(7.84)
The complete graph on
{l
,2, . . . , n} has exactly nn spanning trees, for all
n > 0.
356 GENERATING FUNCTIONS
7.7
DIRICHLET GENERATING FUNCTIONS
There are many other possible ways to generate a sequence from a
series; any system of “kernel” functions K,(z) such that
t
g,,
K,(z) = 0
==+ g,,
= 0 for all n
n
can be used, at least in principle. Ordinary generating functions use K,(z) =
zn,
and exponential generating functions use K, (z) = 2*/n!; we could also try
falling factorial powers zc, 01: binomial coefficients zs/n! =
(R)
.
The most important alternative to gf’s and egf’s uses the kernel functions
1 /n”; it is intended for sequences
(41
, 92, .
. . ) that begin with n = 1 instead
of n = 0:
This is called a Dirichlet generating function (dgf), because the German
mathematician Gustav Lejeune Dirichlet (1805-1859) made much of it.
For example, the dgf of the constant sequence (1 , 1 ,
1,
. . . ) is
(7.86)
This is Riemann’s zeta function, which we have also called the generalized
harmonic number Hk’ when
z
> 1.
The product of Dirichlet generating functions corresponds to a special
kind of convolution:
Thus F(z)
c(z)
= H(z) is the dgf of the sequence
hn
=
x
fd
h/d.
(7.87)
d\n
For example, we know from (4.55) that
td,n
p(d) =
[n=
1
I;
this is
the Dirichlet convolution of the Mobius sequence
(u(
1) ,
p(
2))
u(
3)) . . . ) with
(l,l,l,...),
hence
(7.88)
In other words, the dgf of
(p(l),
FL(~),
p(3), . . .) is
Lo’.
7.7 DIRICHLET GENERATING FUNCTIONS 357
Dirichlet generating functions are particularly valuable when the se-
quence
(gl,g2,...)
is a multiplicative function, namely when
gmn
=
gm
gn
for m
I
n.
In such cases the
v,alues
of
gn
for all n are determined by the values of
g,,
when
n is a power of a prime, and we can factor the dgf into a product over primes:
G(z)
=
I-I
p
prime
(
,+!E+w+!!!?L+...
PLZ P3=
>
If, for instance, we set
gn
= 1 for all n, we obtain a product representation
of Riemann’s zeta function:
L(z)
=
p
gm,.(
&)
The Mobius function has v(p) = -1 and p(pk) = 0 for k > 1, hence its dgf is
G(z)
=
n
(1
-p-“);
(7.91)
p
prime
this agrees, of course, with (7.88) and (7.90). Euler’s
cp
function has
cp(pk)
=
Pk-P
,
k-’ hence its dgf has the factored form
TNe
conclude that
g(z)
=
I(z
-
l)/<(z).
Exercises
Warmups
1
An eccentric collector of 2 x n domino tilings pays $4 for each vertical
domino and $1 for each horizontal domino. How many tilings are worth
exactly $m by this criterion? For example, when m = 6 there are three
solutions:
R,
El, and
B.
2
Give the generating function and the exponential generating function for
the sequence
(2,5,13,35,.
. . ) = (2” +
3n)
in closed form.
3 What is
~.n~cJ
H,/lOn?
4 The general expansion theorem for rational functions
P(z)/Q(z)
is not
completely general, because it restricts the degree of P to be less than
the degree of Q. What happens if P has a larger degree than this?
358 GENERATING FUNCTIONS
5 Find a generating function S(z) such that
[z”l
S(z)
=
x
(;)
(,I,,) .
k
Basics
6 Show that the recurrence (7.32) can be solved by the repertoire method,
without using generating functions.
7 Solve the recurrence
40
=
1;
gn
=
gn
I
+29,-2+...+ng0,
for n > 0.
8 What is
[z”]
(ln(1
-
z))z:‘(l
-
z)~+‘?
9 Use the result of the previous exercise to evaluate
xE=,
HkHnpk.
10
Set
r
= s =
-l/2
in identity (7.61) and then remove all occurrences of
l/2
by using tricks like (5.36). What amazing identity do you deduce?
I deduce that Clark
11 This problem, whose three parts are independent, gives practice in the
Kent is really
superman.
manipulation of generating functions. We assume that A(z) =
x:,
anzn,
B(z) =
t,
bnzn, C(z) =
tncnzn,
and that the coefficients are zero for
negative n.
a
If
‘TX
=
tj+,Zk<n
ojbk, express C in terms of A and B.
b
If nb, = LET0
2kak/(n
-
k)!, express A in terms of B.
C
If
r
is a real number and if a, =
IL=,
(‘+kk)bnpk,
express A in
terms of B; then use your formula to find coefficients fk(r) such that
bn
=
x;=,
fk(T)
an-
k.
12 How many ways are there to put the numbers
{l
,2,.
. . ,2n} into a 2 x n
array so that rows and columns are in increasing order from left to right
and from top to bottom? For example, one solution when n = 5 is
(
12458
3 6 7 910
>
'
13 Prove Raney’s generalized lemma, which is stated just before
(7.6~).
14 Solve the recurrence
go
=
0,
91
= 1,
gkgn-k,
for n >
1,
by using an exponential generating function.
7 EXERCISES 359
15 The Bell number
b,
is the number of ways to partition n things into
subsets. For example,
bs
= 5 because we can partition
{l
,2,3}
in the
following ways:
Prove that b,+l =
x.k
(L)bnpk,
and use this recurrence to find a closed
form for the exponential generating function
I,,
b,z”/n!.
16
Two sequences (a,,) and
(b,,)
are related by the convolution formula
b,
=
k,i-Zkz+...nk,=n
(al+il-1)
((12+:-l)
,.,
(an+:-‘)
;
also as = 0 a:nd
bo
= 1. Prove that the corresponding generating func-
tions satisfy l:nB(z)
=A(z)
+
iA
+ iA(z3)
+....
17
Show that the exponential generating function
G(z)
of a sequence is re-
lated to the ordinary generating function
G(z)
by the formula
Jm
G(zt)e-‘dt = G(z),
0
if the integral exists.
18
Find the Dirichlet generating functions for the sequences
a
sn=@;
b
g,,
= Inn.;
C
gn
= [n is squarefree].
Express your answers in terms of the zeta function. (Squarefreeness is
defined in exercise 4.13.)
19 Every power series
F(z)
=
x
naO
f,z” with
fo
= 1 defines a sequence of
polynomials f,,(x) by the rule
F(z)'
= ~f,(x)z",
II>0
where
f,(
1) =
f,
and
f,(O)
=
[n
=
01.
In general, f,(x) has degree n.
Show that such polynomials always satisfy the convolution formulas
f
fk(X)fn-k(Y)
=
fn(x
+Y)
;
k=O
(x+Y)kkfk(x)fnpk(Y)
=
Xnf,(X+y).
kzo
(The identities in Tables 202 and 258 are special cases of this trick.)
360 GENERATING FUNCTIONS
20 A power series G(z) is called differentiably finite if there exist finitely
many polynomials
PO
(z), . . . ,
P,(z), not all zero, such that
Po(z)G(z)+P,(z)G’(z)+-~+P,(z)G(m)(z)
= 0.
A sequence of numbers (go,
gl
,g2,. . . ) is called polynomially recursive
if there exist finitely many polynomials po (z), . . , p,,,(z), not all zero,
such that
Po(n)gn+m(n)gn+l
+...+h(n)~~+~
= 0
for all integers n 3 0. Prove that a generating function is differentiably
finite if and only if its sequence of coefficients is polynomially recursive.
Homework exercises
21 A robber holds up a bank and demands $500 in tens and twenties. He
also demands to know the number of ways in which the cashier can give
him the money. Find a generating function G(z) for which this number
Will
he settle
for
is
[z500]
G(z), and a more compact generating function G(z) for which
2
x
n
domino
this number is
[z50]
G
(2).
Determine the required number of ways by
tilings?
(a) using partial fractions; (b) using a method like (7.39).
22 Let P be the sum of all ways to “triangulate” polygons:
(The first term represents a degenerate polygon with only two vertices;
every other term shows a polygon that has been divided into triangles.
For example, a pentagon can be triangulated in five ways.) Define a
“multiplication” operation AAB on triangulated polygons A and B so
that the equation
P=
_
+ PAP
is valid. Then replace each triangle by
‘z’;
what does this tell you about
the number of ways to decompose an n-gon into triangles?
23 In how many ways can a 2 x 2 x n pillar be built out of 2 x 1 x 1 bricks?
At union rates, as
24 How many spanning trees are in an n-wheel (a graph with n “outer”
many as you can
afford,
plus a few.
vertices in a cycle, each connected to an (n +
1)st
“hub” vertex), when
n 3 3?
7 EXERCISES 361
25 Let m 3 2 be an integer. What is a closed form for the generating
function of the sequence (n mod m), as a function of
z
and m? Use
this generating function to express ‘n mod m’ in terms of the complex
number w =
eilniirn.
(For example, when m = 2 we have w = -1 and
nmod2=
i
-5(-l)“.)
26 The second-order Fibonacci numbers
(5,)
are defined by the recurrence
50
=
0;
51
=
1;
5,
= 5n-I +
54
+
F,
,
for n
>
1.
Express
5,
in terms of the usual Fibonacci numbers
F,
and F,+r .
27 A 2 x n domino tiling can also be regarded as a way to draw n disjoint
lines in a 2 x n array of points:
If we superimpose two such patterns, we get a set of cycles, since ev-
ery point is touched by two lines. For example, if the lines above are
combined with
,the
lines
the result is
The same set of cycles is also obtained by combining
I I
z
z
:I
I I
with
1
-
-
1
-
-
--
--’
But we get a unique way to reconstruct the original patterns from the
superimposed ones if we assign orientations to the vertical lines by using
arrows that go alternately up/down/up/down/. . . in the first pattern and
alternately down/up/down/up/. . in the second. For example,
The number of such oriented cycle patterns must therefore be
Tz
=
Fi,,
,
and we should be able to prove this via algebra. Let Q,, be the number
of oriented 2 x n cycle patterns. Find a recurrence for Qn, solve it with
generating functions, and deduce algebraically that Qn =
Fi,,
.
28 The coefficients of A(z) in (7.89) satisfy A,+A,+ro+A,+20+Ar+30 = 100
for 0 <
r
< 10. Find a “simple” explanation for this.
362 GENERATING FUNCTIONS
29 What is the sum of Fibonacci products
m>O
k,
+k>+...+k,=n
kl
,kz....,k,>O
30 If the generating function G(z) =
l/(
1
-
1x2)(1
-
(3~)
has the partial
fraction decomposition a/( 1 -KZ) +b/( 1
-
(3z),
what is the partial fraction
decomposition of G(z)“?
31 What function g(n) of the positive
~integer
n satisfies the recurrence
x
g(d) cp(n/d) = 1,
d\n
where
cp
is Euler’s totient function?
32 An arithmetic progression is an infinite set of integers
{an+b}
=
{b,a+b,2a+b,3a+b
,...
}.
A set of arithmetic progressions
{al
n + bl}, . . . , {amn + b,} is called an
exact cover if every nonnegative integer occurs in one and only one of the
progressions. For example, the three progressions
{2n},
{4n + l}, (4n + 3)
constitute an exact cover. Show that if
{al
n +
br},
. . , {amn + b,} is an
exact cover such that 2 6
al
6 .. . < a,,,, then
a,-1
= a,. Hint: Use
generating functions.
Exam problems
33 What is [w”zn] (ln(1 +
z))/(l
-
wz)?
34 Find a closed form for the generating function
tn30
Gn(z)wn, if
(Here m is a fixed positive integer.)
35 Evaluate the sum
xO<k,n
1
/k(n
-
k) in two ways:
a
Expand the summand in partial fractions.
b Treat the sum as a convolution and use generating functions.
36 Let A(z) be the generating function for (ac,
al,
al,
as,
. . . ). Express
t,
aln/,,,Jzn in terms of A, z, and m.
7 EXERCISES 363
37 Let a,, be the number of ways to write the positive integer n as a sum of
powers of 2, disregarding order. For example,
a4
= 4, since 4 = 2 + 2 =
2+1+1
=l+l+l+l.
Byconventionweletao=l.
Letb,=tLZoak
be the cumulative sum of the first a’s.
a
Make a table of the a’s and b’s up through n = 10. What amazing
relation do you observe in your table? (Don’t prove it yet.)
b Express the generating function A(z) as an infinite product.
C
Use the expression from part (b) to prove the result of part (a).
38 Find a closed form for the double generating function
M(w,z)
=:
t
min(m,n)w”‘z”
Tll.n30
Generalize your answer to obtain, for fixed m 3 2, a closed form for
M(zI,
.
..,.z,)
=
z
min(n.1,. . , n,)
2:‘.
. .
z”,m
.
n,
,...,n,30
39 Given positive integers m and n, find closed forms for
t
k,kz...k,
and
x
k,kz...k,.
l<k,<kz<...-:k,<n lik,~k>$...<k,,,$n
(For example, when m = 2 and n = 3 the sums are 1.2 +
1.3
+
2.3
and
1
.l
+1.2+1.3+2:.2+2.3+3.3.)
Hint: What are the coefficients of
z”’
in the
generating functions (1 + al z) . . (1 +
a,z)
and
l/(
1
-
al z) . . . (1
-
a,z)?
40 Express
xk
(L)(kFk-r
-
Fk)(n
-
k)i in closed form.
41 An up-down permutation of order n is an arrangement al
a2
. . . a,, of
the integers {1,2,. . . ,n} that goes alternately up and down:
al
<
a2
:>
a3
<
a4
> . ‘.
For example, 35142 is an up-down permutation of order 5. If A, de-
notes the number of up-down permutations of order n, show that the
exponential
gen.erating
function of (A,,) is (1 + sin z)/cos z.
42 A space probe has discovered that organic material on Mars has DNA
composed of five symbols, denoted by (a, b, c, d, e), instead of the four
components in earthling DNA. The four pairs cd, ce, ed, and ee never
occur consecutively in a string of Martian DNA, but any string with-
out forbidden pairs is possible. (Thus bbcda is forbidden but bbdca is
OK.) How marry Martian DNA strings of length n are possible? (When
n = 2 the answer is 21, because the left and right ends of a string are
distinguishable.)
364 GENERATING FUNCTIONS
43 The Newtonian generating function of a sequence
(gn)
is defined to be
Find a convolution formula that defines the relation between sequences
(fn), (gn), and (h,) whose Newtonian generating functions are related
by the equation
i(z)6
(z) =
h(z).
Try to make your formula as simple
and symmetric as possible.
44 Let q,, be the number of possible outcomes when n numbers
{xl,.
.
,x,}
are compared with each other. For example,
q3
= 13 because the possi-
bilities are
Xl
<x2
<x3
;
X1
<:X2 =X3
;
x1
<x3
<x2;
X1
=Xz<Xj;
X1
=X2=X3;
X1
=‘Xj<X2;
x2
<Xl
<x3
;
X2<Xl
=x3; X2 <: x3
<
x1
;
X2=X3
<Xl
;
Xj<Xl
<x2;
x3<:x1
=x2;
x3
<x2
<Xl
.
Find a closed form for the egf
o(z)
=
t,
qnzn/n!. Also find sequences
(a,),
(W,
(4
such
that
q,
=
tk”ak
=
t
k>O
k
{;}bi;
=
;(;)ck,
foralln>O.
45 Evaluate
,YYm,n>O
[m
I
nl/m2n2.
46 Evaluate
in closed form. Hint:
2.3
-
z2
+ & =
(z+
f)(z-
5)‘.
47 Show that the numbers
U,
and V,, of 3 x n domino tilings, as given in
(7.34), are closely related to the fractions in the Stern-Brocot tree that
converge to
a.
48 A certain sequence
(gn:)
satisfies the recurrence
ag, + bg,+l +
cgrr+2
+
d
=
0,
integer n 3 0,
for some integers (a, b, c, d) with gcd(a, b, c, d) = 1. It also has the closed
form
9
n
=
[c~(
1 +
Jz)“]
, integer n 3 0,
for some real number
(x
between 0 and 1. Find a, b, c, d, and a.
‘7 EXERCISES 365
Kissinger, take note.
49 This is a problem about powers and parity.
a
Consider the sequence
(ao,
al,
a2,. . . ) = (2,2,6,. . . ) defined by the
formula
a
n=
(1
+
da"
+
(1
-
l/2)".
Find a
sim:ple
recurrence relation that is satisfied by this sequence.
b Prove that
[(l
+ &!)“I
E
n (mod 2) for all integers n > 0.
C Find a number
OL
of the form (p +
$7)/2,
where p and q are positive
integers, such that
LLX”]
E
n (mod 2) for all integers n > 0.
Bonus problems
50 Continuing exercise 22, consider the sum of all ways to decompose poly-
gons into polygons:
Q=-tA+n++++
+(>+(p+gJ+ft+~+Q+Q+...
.
Find a symbolic equation for Q and use it to find a generating function
for the number of ways to draw nonintersecting diagonals inside a convex
n-gon. (Give a closed form for the generating function as a function of
z;
you need not
find
a closed form for the coefficients.)
51 Prove that the product
pw
cos
2
jn
-
mfl
Is
this a hint or a
warning?
is the generating function for tilings of an m x n rectangle with dominoes.
(There are mn factors, which we can imagine are written in the mn cells
of the rectangle. If mn is odd, the middle factor is zero. The coefficient
of
II~
ok is the number of ways to do the tiling with j vertical and k
horizontal dominoes.) Hint: This is a difficult problem, really beyond
the scope of this book. You may wish to simply verify the formula in the
case m = 3, n
q
= 4.
52
Prove that the polynomials defined by the recurrence
P*(Y)
=
(Y
-
;)”
-
ng
(;)
(;)n-kpkh),
integer n 3 0,
have the form p,,(y) = x.“,=, IcIy”, where Ii1 is a positive integer for
1 6 m 6 n. Hint: This exercise is very instructive but not very easy.
366 GENERATING FUNCTIONS
53 The sequence of pentagonal numbers
(1,5,12,22,.
. . ) generalizes the
triangular and square numbers in an obvious way:
Let the nth triangular number be
T,,
=
n(n+1)/2;
let the nth pentagonal
number be
P,
=
n(3n
-
1)/2; and let
Ll,,
be the 3 x n domino-tiling
number defined in (7.38). Prove that the triangular number
TIuq,+Lml
i,z
is also a pentagonal number. Hint:
3Ui,
q =
(Vznml
+ Vln+l)’ + 2.
54 Consider the following curious construction:
1 2 3 4
5 6 7 8 9 10
11
12 13 14 15 16
. . .
12 3 4
6 7 8 9
11
12 13 14 16
. . .
1 3 610
16
23
31
40
51
63 76 90 106
. . .
13
6
16
23
31 51
63 76 106
. . .
1 4
10
26 49 80
131 194
270 376
. . .
14
26 49
131
194 376
. . .
15
31
80
211
405
781
. . .
1
31 211 781
. . .
1
32 243 1024
. . .
(Start with a row containing all the positive integers. Then delete every
mth column; here m = 5. Then replace the remaining entries by partial
sums. Then delete every (m
-
1 )st column. Then replace with partial
sums again, and so on.) Use generating functions to show that the final
result is the sequence of mth powers. For example, when m = 5 we get
(15,25,35,45
,...)
asshown.
55 Prove that if the power series
F(z)
and G(z) are differentiably finite (as
defined in exercise 20), then so are
F(z)
+ G(z) and
F(z)G(z).
Research problems
56 Prove that there is no “simple closed form” for the coefficient of
Z”
in
(1 +
z
+ z~)~, as a function of n, in some large class of “simple closed
forms!’
5’7
Prove or disprove: If all the coefficients of G(z) are either 0 or 1, and if
all the coefficients of G (2)’ are less than some constant M, then infinitely
many of the coefficients of
G(z)’
are zero.
8
Discrete Probability
THE ELEMENT OF CHANCE enters into many of our attempts to under-
stand the world we live in. A mathematical theory of probability allows us
to calculate the likelihood of complex events if we assume that the events are
governed by appropriate axioms. This theory has significant applications in
all branches of science, and it has strong connections with the techniques we
have studied in previous chapters.
Probabilities are called “discrete” if we can compute the probabilities of
all events by summation instead of by integration. We are getting pretty good
at sums, so it should come as no great surprise that we are ready to apply
our knowledge to some interesting calculations of probabilities and averages.
8.1 DEFINITIONS
(Readers unfamiliar
Probability theory starts with the idea of a probability space, which
with probability
theory will, with
is a set
fl
of all things that can happen in a given problem together with a
high probability,
rule that assigns a probability Pr(w) to each elementary event w
E
a.
The
benefit from a
probability Pr(w) must be a nonnegative real number, and the condition
perusal of Feller’s
classic introduction
to the subject [96].)
x
Pr(w) = 1
(8.1)
WEn
must hold in every
dimscrete
probability space. Thus, each value Pr(w) must lie
in the interval
[O
. .
11.
We speak of Pr as a probability distribution, because
it distributes a total probability of 1 among the events w.
Here’s an example: If we’re rolling a pair of dice, the set
0
of elementary
events is D2 =
{
q
E],
q
D, . . . ,
q
a}, where
Never say die.
is the set of all six ways that a given die can land. Two rolls such as q u
and q n are considered to be distinct; hence this probability space has a
367
368 DISCRETE PROBABILITY
total of
6’
= 36 elements.
We usually assume that dice are “fair,” namely that each of the six possi-
bilities for a particular die has probability
i,
and that each of the 36 possible
rolls in
n
has probability
8.
But we can also consider “loaded” dice in which
Careful: They
there is a different distribution of probabilities. For example, let
might go off.
Prl(m)
= Pr,(m) =
+;
Prl(a)
=
Prl(m)
=
Prj(m)
=
Prl(m)
=
f.
Then
LED
Prl
(d) = 1, so
Prl
is a probability distribution on the set D, and
we can assign probabilities to the elements of
f2
=
D2
by the rule
Pr,,(dd’) = Prl(d) Prl(d’).
(8.2)
For example, Prlj ( q m) =
i.
i
=
A.
This is a valid distribution because
x
Prll(w) =
t
Prll(dd’)
=
t
Prl(d) Prl(d’)
wen
dd’EDZ
d,d’ED
=
x
Prl(d)
x
Prr(d’) = 1 . 1 = 1 .
dED
d’ED
We can also consider the case of one fair die and one loaded die,
Prol(dd’) = Pro(d) Prl(d’),
where Pro(d) =
5,
(8.3)
in which case
ProI
( q m) =
i
.
i
=
&.
Dice in the “real world” can’t really
be expected to turn up equally often on each side, because there is not perfect
If
all
sides of
a
cube
symmetry; but
i
is usually pretty close to the truth.
were identical,
how
An event is a subset of
n.
In dice games, for example, the set
could we tell which
side is face up?
is the event that “doubles are thrown!’
The individual elements
w
of
0
are
called elementary events because they cannot be decomposed into smaller
subsets; we can think of
co
as a one-element event {w}.
The probability of an event A is defined by the formula
Pr(wE A) =
x
Pr(w);
WEA
(8.4)
and in general if R(o) is any statement about
w,
we write
‘Pr(R(w))’
for the
sum of all
Pr(w)
such that R(w) is true. Thus, for example, the probability of
doubles with fair dice is
$
+
&
+
&
+
$
+
$
+
&
=
i;
but when both dice are
loaded with probability distribution
Prl
it is
1+~+~+~+~+~
=
&
>
i.
16 64 64 64 64 16
Loading the dice makes the event “doubles are thrown” more probable.
8.1 DEFINITIONS 369
(We have been using x-notation in a more general sense here than de-
fined in Chapter 2: The sums in (8.1) and (8.4) occur over all elements w
of an arbitrary set, not over integers only. However, this new development is
not really alarming; we can agree to use special notation under a
t
whenever
nonintegers are intended, so there will be no confusion with our ordinary con-
ventions. The other definitions in Chapter 2 are still valid; in particular, the
definition of infinite
,sums
in that chapter gives the appropriate interpretation
to our sums when the set
fl
is infinite. Each probability is nonnegative, and
the sum of all proba’bilities is bounded, so the probability of event A in (8.4)
is well defined for all subsets A
C
n.)
A random variable is a function defined on the elementary events w of a
probability space. For example, if
n
=
D2
we can define S(w) to be the sum
of the spots on the dice roll w, so that S( q m) = 6 + 3 = 9. The probability
that the spots total seven is the probability of the event S(w) = 7, namely
Pr(Om)
+
Pr(mm)
+
Pr(mn)
+
Pr(flE])
+
Pr(mn)
+
Pr(mm)
With fair dice (Pr = Proo), this happens with probability
i;
with loaded dice
(Pr =
Prl,
), it happens with probability & + & + & + & + & +
$
= &,
the same as we observed for doubles.
It’s customary to drop the ‘(w)’ when we talk about random variables,
because there’s usually only one probability space involved when we’re work-
ing on any particular problem. Thus we say simply ‘S = 7’ for the event that
a 7 was rolled, and ‘S = 4’ for the event
{
q
m,
q
m,
q
m
}.
A random varialble can be characterized by the probability distribution of
its values. Thus, for example, S takes on eleven possible values {2,3, . . . ,12},
and we can tabulate the probability that S = s for each s in this set:
S
12
3 4 5 6 7
8
9 10
11
12
6
ii
G
3
2
1
z z z w
I2
7
4
64
w
$5
w w
4
64
If we’re working on a. problem that involves only the random variable S and no
other properties of dice, we can compute the answer from these probabilities
alone, without regard to the details of the set
n
= D2. In fact, we could
define the probability space to be the smaller set
n
= {2,3,. . . ,12}, with
whatever probabilikv distribution Pr(s) is desired. Then
‘S
= 4’ would be
an elementary event. Thus we can often ignore the underlying probability
space
n
and work directly with random variables and their distributions.
If two random variables X and Y are defined over the same probabil-
ity space
Q
we can charactedze their behavior without knowing everything
370 DISCRETE PROBABILITY
about R if we know the
‘joi.nt
distribution”
Just Say
No.
Pr(X=x and
Y=y)
for each x in the range of X and each y in the range of Y. We say that X and
Y are independent random variables if
Pr(X=x and Y=y) =
Pr(X=x).
Pr(Y=y)
(8.5)
for all x and y. Intuitively, this means that the value of X has no effect on
the value of Y.
For example, if
fl
is the set of dice rolls D2, we can let
S1
be the number
of spots on the first die and
S2
the number of spots on the second. Then
the random variables
S1
and
S2
are independent with respect to each of the
probability distributions Prcc,
Prl,
, and
ProI
discussed earlier, because we
defined the dice probability for each elementary event dd’ as a product of a
probability for
S1
= d multiplied by a probability for
S2
= d’. We could have
defined probabilities differently so that, say,
pr(am)
/
Pr(mm)
#
Pr(aa)
/
Pr(Om);
but we didn’t do that, because different dice aren’t supposed to influence each
other. With our definitions, both of these ratios are Pr(S2
=5)/
Pr(S2
=6).
We have defined S to be the sum of the two spot values,
S1
+
SZ.
Let’s
consider another random variable P, the product SlS2. Are S and P indepen-
dent? Informally, no; if we are told that S = 2, we know that P must be 1.
Formally, no again, because the independence condition (8.5) fails spectac-
ularly (at least in the case of fair dice): For all legal values of s and
p,
we
have 0 <
Proo[S
=s].Proo[P=p] 6
5.4;
this can’t equal
Proo[S
=sandP=p],
which is a multiple of A.
If we want to understand the typical behavior of a given random vari-
able, we often ask about its “average” value. But the notion of “average”
is ambiguous; people generally speak about three different kinds of averages
when a sequence of numbers is given:
.
the mean (which is the. sum of all values, divided by the number of
values);
.
the median (which is the middle value, numerically);
.
the mode (which is the value that occurs most often).
For example, the mean of
(3,1,4,1,5)
is
3+1+t+1+5
= 2.8; the median is 3;
the mode is
1.
But probability theorists usually work with random variables instead of
with sequences of numbers, so we want to define the notion of an “average” for
random variables too. Suppose we repeat an experiment over and over again,
A
dicey
inequality.
8.1 DEFINITIONS 371
making independent trials in such a way that each value of X occurs with
a frequency approximately proportional to its probability. (For example, we
might roll a pair of dice many times, observing the values of S and/or P.) We’d
like to define the average value of a random variable so that such experiments
will usually produce a sequence of numbers whose mean, median, or mode is
approximately the
s,ame
as the mean, median, or mode of X, according to our
definitions.
Here’s how it can be done: The mean of a random real-valued variable X
on a probability space
n
is defined to be
t
x.Pr(X=:x)
(8.6)
XEX(cl)
if this potentially infinite sum exists. (Here X(n) stands for the set of all
values that X can assume.) The median of X is defined to be the set of all x
such that
Pr(X6x) 3
g
and Pr(X3x) 2
i.
(8.7)
And the mode of X is defined to be the set of all x such that
Pr(X=x) 3 Pr(X=x’) for all x’
E
X(n).
(8.8)
In our dice-throwing example, the mean of S turns out to be 2.
&
+ 3.
$
+...
+
12.
&
= 7 in distribution
Prcc,
and it also turns out to be 7 in
distribution
Prr
1.
The median and mode both turn out to be
(7)
as well,
in both distributions. So S has the same average under all three definitions.
On the other hand the P in distribution
Pro0
turns out to have a mean value
of
4s
= 12.25; its median is
{lo},
and its mode is
{6,12}.
The mean of P is
4
unchanged if we load the dice with distribution Prll , but the median drops
to
{8}
and the mode becomes
{6}
alone.
Probability theorists have a special name and notation for the mean of a
random variable:
Th.ey
call it the
expected
value, and write
EX =
t
X(w) Pr(w).
wEn
(8.9)
In our dice-throwing example, this sum has 36 terms (one for each element
of
!J),
while (8.6) is a sum of only eleven terms. But both sums have the
same value, because they’re both equal to
1
xPr(w)[x=X(w)]
UJEfl
XEX(Cl)
372 DISCRETE PROBABILITY
The mean of a random variable turns out to be more meaningful in
[get
it:
applications than the other kinds of averages, so we shall largely forget about
On average,
“aver-
medians and modes from now on. We will use the terms “expected value,”
age” means “mean.”
“mean,” and “average” almost interchangeably in the rest of this chapter.
If X and Y are any two random variables defined on the same probability
space, then X + Y is also a random variable on that space. By formula (8.g),
the average of their sum is the sum of their averages:
E(X+Y) =
x
(X(w) +Y(cu)) Pr(cu) = EX+ EY.
WEfl
(8.10)
Similarly, if
OL
is any constant we have the simple rule
E(oLX)
= REX. (8.11)
But the corresponding rule for multiplication of random variables is more
complicated in general; the expected value is defined as a sum over elementary
events, and sums of products don’t often have a simple form. In spite of this
difficulty, there is a very nice formula for the mean of a product in the special
case that the random variables are independent:
E(XY) = (EX)(EY), if X and Y are independent. (8.12)
We can prove this by the distributive law for products,
E(XY) =
x
X(w)Y(cu).Pr(w)
WEfl
=t
xy.Pr(X=x
and Y=y)
xcx(n)
YEY(fl)
=
t
xy.Pr(X=x)
Pr(Y=y)
?&X(n)
YEY(fl)
=
x
xPr(X=x) .
x
yPr(Y=y)
= (EX)(EY).
XEX(cll
Y
EY(n)
For example, we know that S =
Sr
+Sl
and P =
Sr
SZ,
when
Sr
and
Sz
are
the numbers of spots on the first and second of a pair of random dice. We have
ES, =
ES2
= 5, hence ES = 7; furthermore
Sr
and
Sz
are independent, so
EP =
G.G
=
y,
as claimedearlier. We also have E(S+P) = ES+EP =
7+7.
But S and P are not independent, so we cannot assert that E(SP) =
7.y
=
y.
In fact, the expected value of SP turns out to equal
y
in distribution
Prco,
112 (exactly) in distribution Prlr .
(Slightly subtle
point:
There
are two
probability
spaces,
depending on what
strategy we use;
but
EX,
and
EXz
are
the
same in both.)
8.2 MEAN AND VARIANCE 373
8.2 MEAN AND VARIANCE
The next most important property of a random variable, after we
know its expected value, is its variance, defined as the mean square deviation
from the mean:
?X = E((X
-
E-X)‘)
.
(8.13)
If we denote EX by
~1,
the variance VX is the expected value of (X-
FL)‘.
This
measures the “spread” of X’s distribution.
As a simple exa:mple of variance computation, let’s suppose we have just
been made an offer we can’t refuse: Someone has given us two gift certificates
for a certain lottery. The lottery organizers sell 100 tickets for each weekly
drawing. One of these tickets is selected by a uniformly random process-
that is, each ticket is equally likely to be chosen-and the lucky ticket holder
wins a hundred million dollars. The other 99 ticket holders win nothing.
We can use our gift in two ways: Either we buy two tickets in the same
lottery, or we buy
‘one
ticket in each of two lotteries. Which is a better
strategy? Let’s try to analyze this by letting
X1
and
XZ
be random variables
that represent the amount we win on our first and second ticket. The expected
value of
X1,
in millions, is
EX, =
~~O+&,.lOO
= 1,
and the same holds for
X2.
Expected values are additive, so our average total
winnings will be
E(X1
+
X2)
=
‘EX,
+
EX2
= 2 million dollars,
regardless of which strategy we adopt.
Still, the two strategies seem different. Let’s look beyond expected values
and study the exact probability distribution of
X1
+
X2:
winnings (millions)
0
100
200
I
same drawing
.9800 .0200
different drawings
.9801
.0198
.OOOl
If we buy two tickets in the same lottery we have a 98% chance of winning
nothing and a 2% chance of winning $100 million. If we buy them in different
lotteries we have a 98.01% chance of winning nothing, so this is slightly more
likely than before; a.nd we have a 0.01% chance of winning $200 million, also
slightly more likely than before; and our chances of winning $100 million are
now 1.98%. So the distribution of
X1
+
X2
in this second situation is slightly
374 DISCRETE PROBABILITY
more spread out; the middle value, $100 million, is slightly less likely, but the
extreme values are slightly more likely.
It’s this notion of the spread of a random variable that the variance is
intended to capture. We measure the spread in terms of the squared deviation
of the random variable from its mean. In case 1, the variance is therefore
.SS(OM
-
2M)’ + .02(
1OOM
-
2M)’ =
196M2
;
in case 2 it is
.9801
(OM
-
2M)’ + .0198( 1 OOM
-
2M)2 + .0001(200M
-
2M)’
= 198M2.
As we expected, the latter variance is slightly larger, because the distribution
of case 2 is slightly more spread out.
When we work with variances, everything is squared, so the numbers can
get pretty big. (The factor
M2
is one trillion, which is somewhat imposing
Interesting: The
even for high-stakes gamblers.) To convert the numbers back to the more
variance of a dollar
meaningful original scale, we often take the square root of the variance. The
amount is expressed
in units of square
resulting number is called the standard deviation, and it is usually denoted
dollars.
by the Greek letter
o:
0=&Z.
(8.14)
The standard deviations of the random variables X’ +
X2
in our two lottery
strategies are
&%%?
= 14.00M and
&?%?
z
14.071247M. In some sense
the second alternative is about $71,247 riskier.
How does the variance help us choose a strategy? It’s not clear. The
strategy with higher variance is a little riskier; but do we get the most for our
money by taking more risks or by playing it safe? Suppose we had the chance
to buy 100 tickets instead of only two. Then we could have a guaranteed
victory in a single lottery (and the variance would be zero); or we could
gamble on a hundred different lotteries, with a .99”’
M
.366 chance of winning
nothing but also with a nonzero probability of winning up to $10,000,000,000.
To decide between these alternatives is beyond the scope of this book; all we
can do here is explain how to do the calculations.
In fact, there is a simpler way to calculate the variance, instead of using
the definition (8.13). (We suspect that there must be something going on
in the mathematics behind the scenes, because the variances in the lottery
example magically came out to be integer multiples of M’.) We have
Another way to
reduce risk might
be to bribe the
lottery oficials.
I
guess that’s where
probability becomes
indiscreet.
(N.B.: Opinions
expressed in these
margins do not
necessarily represent
the opinions of the
management.)
E((X
-
EX)‘) = E(X2
-
ZX(EX)
+ (EX)‘)
= E(X’)
-
2(EX)(EX) + (EX)’ ,
8.2 MEAN AND VARIANCE 375
since (EX) is a constant; hence
VX =
E(X’)
-
(EX)‘.
(8.15)
“The variance is the mean of the square minus the square of the mean.”
For example, the mean of
(Xl
+X2)’ comes to
.98(0M)2
+
.02(
100M)2
=
200M’ or to
.9801
I(OM)2
+
.0198(
100M)’
+
.OOOl
(200M)2
=
202M2
in the
lottery problem. Subtracting
4M2
(the square of the mean) gives the results
we obtained the hard way.
There’s an even easier formula yet, if we want to calculate V(X+ Y) when
X and Y are independent: We have
E((X+Y)‘) =
E(X2
+2XY+Yz)
=
E(X’)
+2(EX)(EY) + E(Y’),
since we know that E(XY) = (EX) (EY) in the independent case. Therefore
V(X + Y) =
E#((X
+
Y)‘)
-
(EX +
EY)’
=
EI:X’)
+
Z(EX)(EY)
+
E(Y’)
-- (EX)‘-2(EX)(EY)
-
(EY)’
=
El:X’)
-
(EX)’
+
E(Y’)
-
(EY)’
=
VxtvY.
(8.16)
“The variance of a sum of independent random variables is the sum of their
variances.” For example, the variance of the amount we can win with a single
lottery ticket is
E(X:)
-
(EXl
)’
=
.99(0M)2
+
.Ol(lOOM)’
-
(1 M)’ =
99M2
.
Therefore the variance of the total winnings of two lottery tickets in two
separate (independent) lotteries is 2x
99M2
=
198M2.
And the corresponding
variance for n independent lottery tickets is n x
99M2.
The variance of the dice-roll sum S drops out of this same formula, since
S =
S1
+
S2
is the sum of two independent random variables. We have
6
=
;(12+22+32+42+52+62!-
;
=
12
0
2
35
when the dice are fair; hence VS =
z
+
g
=
F.
The loaded die has
VSI
=
;(2.12+22+32+42+52+2.62)-
376 DISCRETE PROBABILITY
hence VS =
y
= 7.5 when both dice are loaded. Notice that the loaded dice
give S a larger variance, although S actually assumes its average value 7 more
often than it would with fair dice. If our goal is to shoot lots of lucky
7’s,
the
variance is not our best indicator of success.
OK, we have learned how to compute variances. But we haven’t really
seen a good reason why the variance is a natural thing to compute. Everybody
does it, but why? The main reason is Chebyshew’s inequality ([24’] and
If
he proved it in
[50’]), which states that the variance has a significant property:
1867,
it’s a classic
‘67 Chebyshev.
Pr((X-EX)‘>a)
< VX/ol, for all a > 0.
(8.17)
(This is different from the summation inequalities of Chebyshev that we en-
countered in Chapter 2.) Very roughly, (8.17) tells us that a random variable X
will rarely be far from its mean EX if its variance VX is small. The proof is
amazingly simple. We have
VX =
x
(X(w)
-
EX:? Pr(w)
CLJE~~
3
x
(X(w)
-EXf
Pr(cu)
WEn
(X(w)-EX)‘>a
3
x
aPr(w)
= oL.Pr((X
-
EX)’ > a)
;
WEn
(X(W)-EX]~&~
dividing by a finishes the proof.
If we write u for the mean and o for the standard deviation, and if we
replace
01
by c2VX in (8.17), the condition (X
-
EX)’ 3 c2VX is the same as
(X
-
FL)
3
(~0)~;
hence (8.17) says that
Pr(/X
-
~13 co) 6
l/c’.
(8.18)
Thus, X will lie within c standard deviations of its mean value except with
probability at most
l/c’.
A random variable will lie within 20 of
FL
at least
75% of the time; it will lie between u
-
100 and
CL
+ 100 at least 99% of the
time. These are the cases
OL
:=
4VX and
OL
=
1OOVX
of Chebyshev’s inequality.
If we roll a pair of fair dice n times, the total value of the n rolls will
almost always be near 7n, for large n. Here’s why: The variance of n in-
dependent rolls is
Fn.
A variance of
an
means a standard deviation of
only
(That is, the aver-
age will fall between
the stated limits in
at least 99% of all
cases when we look
at a set of n inde-
pendent samples,
for any fixed value
of n Don’t mis-
understand this as
a statement about
the averages of an
infinite sequence
Xl,
x2, x3, .
as n varies.)
8.2 MEAN AND VARIANCE 377
So Chebyshev’s inequality tells us that the final sum will lie between
7n-lO@
and
7n+lO@
in at least 99% of all experiments when n fair dice are rolled. For example,
the odds are better than 99 to 1 that the total value of a million rolls will be
between 6.976 million and 7.024 million.
In general, let X be any random variable over a probability space
f&
hav-
ing finite mean p and finite standard deviation
o.
Then we can consider the
probability space
0”
whose elementary events are n-tuples
(WI,
~2,.
. . ,
w,)
with each uk
E
fl,
amd
whose probabilities are
Pr(wl,
~2,.
. . , (u,) =
Pr(wl)
Pr(w2).
. . Pr(cu,) .
If we now define random variables
Xk
by the formula
Xk(ul,WZ,...
,%)
=
x(wk),
the quantity
Xl
+
x2
+.
. . +
x,
is a sum of n independent random variables, which corresponds to taking n
independent “samples” of X on
n
and adding them together. The mean of
X1
+X2+.
.+X,
is
ntp,
and the standard deviation is
fi
o;
hence the average
of the n samples,
A(X,
+Xz+..,+X,),
will lie between p
-
100/J;;
and p + loo/,/K at least 99% of the time. In
other words, if we
dhoose
a large enough value of n, the average of n inde-
pendent samples will almost always be very near the expected value EX. (An
even stronger theorem called the Strong Law of Large Numbers is proved in
textbooks of probability theory; but the simple consequence of Chebyshev’s
inequality that we
h,ave
just derived is enough for our purposes.)
Sometimes we don’t know the characteristics of a probability space, and
we want to estimate the mean of a random variable X by sampling its value
repeatedly. (For exa.mple, we might want to know the average temperature
at noon on a January day in San Francisco; or we may wish to know the
mean life expectancy of insurance agents.) If we have obtained independent
empirical observations
X1,
X2,
. . . ,
X,,
we can guess that the true mean is
approximately
ix
=
Xl+Xzt".+X,
n
(8.19)
378 DISCRETE PROBABILITY
And we can also make an estimate of the variance, using the formula
\ix
1
x:
+
x:
+
+
;y’n
_
(X,
+
X2
+
‘.
+
X,)2
n-l
n(n-1)
(8.20)
The (n ~ 1)
‘s
in this formula look like typographic errors; it seems they should
be n’s, as in (8.1g), because the true variance VX is defined by expected values
in (8.15). Yet we get a better estimate with n
-
1 instead of n here, because
definition (8.20) implies that
E(i/X) = VX.
Here’s why:
E(\;/X)
=
&E(
tx:
-
k=l
k=l
1
n
=-
n-l
(x
W2)
k=l
-
k
f
f
(E(Xi’lj#kl+
E(X')Lj=kl))
j=l
k=l
=
&(nE(X’)
-
k(nE(X’)
+n(n-
l)E(X)'))
(8.21)
;
f
f
xjxk)
j=l
k=l
=
E(X')-E(X)“
= VX
(This derivation uses the independence of the observations when it replaces
E(XjXk) by (EX)‘[j
fk]
+ E(X’)[j
=k].)
In practice, experimental results about a random variable X are usually
obtained by calculating a sample mean
&
=
iX
and a sample standard de-
viation
ir
=
fi,
and presenting the answer in the form
fi
f
b/,/i?
‘.
For
example, here are ten rolls of two supposedly fair dice:
The sample mean of the spot sum S is
fi =
(7+11+8+5+4+6+10+8+8+7)/10
= 7.4;
the sample variance is
(72+112+82+52+42+62+102+82+82+72-10~2)/9
z
2.12
8.2 MEAN AND VARIANCE 379
We estimate the average spot sum of these dice to be
7.4&2.1/m
= 7.4~tO.7,
on the basis of these experiments.
Let’s work one more example of means and variances, in order to show
how they can be ca.lculated theoretically instead of empirically. One of the
questions we considered in Chapter 5 was the “football victory problem,’
where n hats are thrown into the air and the result is a random permutation
of hats. We showed
fin
equation (5.51) that there’s a probability of ni/n!
z
1
/e
that nobody gets
thle
right hat back. We also derived the formula
P(n,k)
=
nl
‘n
(n-k)i
=
-!&$
0
.
\
k
for the probability that exactly k people end up with their own hats.
Restating these results in the formalism just learned, we can consider the
probability space
FF,
of all n! permutations n of {1,2,. . . , n}, where Pr(n) =
1 /n! for all n
E
Fin.
The random variable
Not to be confused
F,(x)
= number of “fixed points” of n , for
7[
E
Fl,,
with a Fibonacci
number.
measures the number of correct hat-falls in the football victory problem.
Equation (8.22) gives
Pr(F,
= k), but let’s pretend that we don’t know any
such formula; we merely want to study the average value of
F,,
and its stan-
dard deviation.
The average value is, in fact, extremely easy to calculate, avoiding all the
complexities of Cha.pter 5. We simply observe that
F,(n)
=
F,,I
(7~)
+
F,,2(74
+
+
F,,,(d)
Fn,k(~) = [position k of
rc
is a fixed point] , for n
E
Fl,.
Hence
EF, = EF,,, i- EF,,z + . . . + EF,,,,
And the expected value of
Fn,k
is simply the probability that
Fn,k
=
1,
which
is l/n because exactly (n
-
l)! of the n! permutations n =
~1~2
. . . n,
E
FF,
have
nk
= k. Therefore
EF, = n/n
=:
1 , for n > 0.
(8.23)
One the average.
On the average, one hat will be in its correct place. “A random permutation
has one fixed point, on the average.”
Now what’s the standard deviation? This question is more difficult, be-
cause the
Fn,k
‘s
are not independent of each other. But we can calculate the
380 DISCRETE PROBABILITY
variance by analyzing the mutual dependencies among them:
E(FL,)
=
E(
(
fFn,k)i’)
=
E(
f
i
Fn,j
Fn,k)
k=l
j=l
k=l
n
n
=
7
7
E(Fn,jl’n,k)
=
t
E(Fi,k)+2
x
E(Fn,j
Fn,k)
j=l
k=l
1
<k<n
l<j<k<n
(We used a similar trick when we derived (2.33) in Chapter 2.) Now Ft
k
=
Fn,k,
Since
Fn,k
is either 0 or 1; hence E(Fi,,) =
EF,,k
= l/n as before. And
if j < k we have
E(F,,j
F,,k)
=
Pr(rr
has both j and k as fixed points) =
(n
-
2)!/n! =
l/n(n
-
1). Therefore
E(FfJ
=
;
+
n
;!
=
0
2
n(n-1)
2,
for n 3 2.
(8.24)
(As a check when n = 3, we have
f02
+
il’
+ i22 + i32 = 2.) The variance
is
E(Fi)
-
(EF,)'
=
1,
so the standard deviation (like the mean) is 1. “A
random permutation of n 3 2 elements has 1
f
1 fixed points.”
8.3 PROBABILITY GENERATING FUNCTIONS
If X is a random
varia.ble
that takes only nonnegative integer values,
we can capture its probability distribution nicely by using the techniques of
Chapter 7. The probability generating function or pgf of X is
Gx(z)
=
~Pr(X=k)zk.
k>O
(8.25)
This power series in
z
contains all the information about the random vari-
able X. We can also express it in two other ways:
Gx(z)
=
x
Pr(w)zX(W)
= E(z’).
WEfl
(8.26)
The coefficients of
Gx(z)
are nonnegative, and they sum to 1; the latter
condition can be written
Gx(1)
= 1.
(8.27)
Conversely, any power series
G(z)
with nonnegative coefficients and with
G
(1)
=
1
is the pgf of some random variable.
8.3 PROBABILITY GENERATING FUNCTIONS 381
The nicest
thin,g
about pgf’s is that they usually simplify the computation
of means and variances. For example, the mean is easily expressed:
EX =
xk.P:r(X=k)
k>O
=
~Pr(X=k).kzk~‘lr=,
k>O
=
G;(l).
(8.28)
We simply differentiate the pgf with respect to
z
and set z = 1.
The variance is only slightly more complicated:
E(X’)
=
xk*.Pr(X=k)
k>O
=
xPr(X=k).(k(k-
1)~~~’
+ kzk-‘)
I==,
=
G;(l)
+
G;(l).
k>O
Therefore
VX
=
G;(l)
+-
G&(l)-
G;(l)2.
(8.29)
Equations (8.28) and (8.29) tell us that we can compute the mean and variance
if we can compute the values of two derivatives,
GI,
(1)
and
Gi
(1).
We don’t
have to know a closed form for the probabilities; we don’t even have to know
a closed form for
G;c
(z) itself.
It is convenient’ to write
Mean(G) = G'(l),
(8.30)
Var(G)
= G"(l)+ G'(l)-
G'(l)',
(8.31)
when G is any function, since we frequently want to compute these combina-
tions of derivatives.
The second-nicest thing about pgf’s is that they are comparatively sim-
ple functions of
z,
in many important cases. For example, let’s look at the
uniform distribution of order n, in which the random variable takes on each
of the values {0, 1, . ,, . , n
-
l}
with probability l/n. The pgf in this case is
U,(z)
= ;(l-tz+...+znp')
=
k&g,
for n
3
1.
(8.32)
We have a closed form for U,(z) because this is a geometric series.
But this closed form proves to be somewhat embarrassing: When we plug
in
z
= 1 (the value of
z
that’s most critical for the pgf), we get the undefined
ratio O/O, even though U,(z) is a polynomial that is perfectly well defined
at any value of
z.
The value
U,
(1) = 1 is obvious from the non-closed form
382 DISCRETE PROBABILITY
(1
+z+...
+ znP1)/n, yet it seems that we must resort to L’Hospital’s rule
to find
lim,,,
U,(z) if we want to determine
U,(
1) from the closed form.
The determination of
UA(
1) by L’Hospital’s rule will be even harder, because
there will be a factor of (z- 1
1’
in the denominator;
l-l:
(1) will be harder still.
Luckily there’s a nice way out of this dilemma. If G(z) = Ena0 gnzn is
any power series that converges for at least one value of
z
with
Iz/
> 1, the
power series G’(z) = j-n>OngnznP’ will also have this property, and so will
G”(z), G”‘(z), etc. There/fore by Taylor’s theorem we can write
G(,+t)
=
G(,)+~~t+~t2+~t3+...;
(8.33)
all derivatives of G(z) at z
=.
1 will appear as coefficients, when G( 1 + t) is
expanded in powers of t.
For example, the derivatives of the uniform pgf U,(z) are easily found
in this way:
1
(l+t)“-1
U,(l +t) =
;
t
_
= k(y)
+;;(;)t+;(;)t2+...+;(;)tn-l
Comparing this to (8.33) gives
U,(l)
= 1;
u;(l)
=
v;
u;(l)
=
(n-l)(n-2);
3
(8.34)
and in general Uim’ (1) = (n -- 1
)“/
(m + 1
),
although we need only the cases
m = 1 and m = 2 to compute the mean and the variance. The mean of the
uniform distribution is
n-l
ulm
=
2’
and the variance is
(8.35)
U::(l)+U:,(l)-U:,(l)2
=
4
(n-
l)(n-2)
+6(n-l)
3
(n-l)2
~_
12 12 12
The third-nicest thing about pgf’s is that the product of pgf’s corresponds
to the sum of independent random variables. We learned in Chapters 5 and 7
that the product of generating functions corresponds to the convolution of
sequences; but it’s even more important in applications to know that the
convolution of probabilities corresponds to the sum of independent random
8.3 PROBABILITY GENERATING FUNCTIONS 383
variables. Indeed, if X and Y are random variables that take on nothing but
integer values, the probability that X + Y = n is
Pr(X+Y=n)
:=
xPr(X=kandY=n-k).
k
If X and Y are independent, we now have
Pr(X+Y=n)
I=
tPr(X=k)
Pr(Y=n-k),
k
a convolution. Therefore-and this is the punch
line-
Gx+Y(z)
=
Gx(z)
GY(z),
if X and Y are independent.
(8.37)
Earlier this chapter
‘we
observed that V( X + Y) = VX + VY when X and Y are
independent. Let
F(z)
and G(z) be the pgf’s for X and Y, and let H(z) be the
pgf for X + Y. Then
H(z)
=
F(z)G(z),
and our formulas (8.28) through
(8.31)
for mean and variance tell us that we
must have
Mean(H) = Mean(F) + Mean(G)
;
(8.38)
Var(H) = Var(F)
+Var(G).
(8.39)
These formulas, which are properties of the derivatives Mean(H) = H’( 1) and
Var(H) = H”( 1) + H’( 1)
-
H’( 1
)2,
aren’t valid for arbitrary function products
H(z) = F(z)G(z); we have
H’(z) = F’(z)G(z) + F(z)G’(z) ,
H”(z) =
F”(z)G(z)
+2F’(z)G’(z)
+
F(z)G”(z).
But if we set
z
= 1,
‘we
can see that (8.38) and (8.39) will be valid in general
provided only that
F(1) = G(1) = 1
(8.40)
and that the derivatives exist. The “probabilities” don’t have to be in
[O
11
for these formulas to hold. We can normalize the functions F(z) and G(z)
by dividing through by F( 1) and G (1) in order to make this condition valid,
whenever F( 1) and G (1) are nonzero.
Mean and variance aren’t the whole story. They are merely two of an
I’//
graduate magna
infinite series of so-c:alled
cumulant
statistics introduced by the Danish as-
cum ulant.
tronomer Thorvald Nicolai Thiele
[288]
in 1903. The first two cumulants
384 DISCRETE PROBABILITY
~1
and ~2 of a random variable are what we have called the mean and the
variance; there also are higher-order cumulants that express more subtle prop-
erties of a distribution. The general formula
ln
G(et)
=
$t
+
$t2
+
$t3
+
zt4
+
. . .
(8.41)
defines the cumulants of all orders, when
G(z)
is the pgf of a random variable.
Let’s look at cumulants more closely. If
G(z)
is the pgf for X, we have
G(et)
=
tPr(X=k)ekt
=
x
Pr(X=k)s
k>O
k,m>O
=
,+CLlt+ClZt2+E++
l!
2!
3!
...
(8.42)
where
Pm =
x
k”‘Pr(X=k) =
E(Xm).
(8.43)
This quantity
pm
is called the “mth moment” of X. We can take exponentials
on both sides of
(8.41),
obtaining another formula for
G(et):
G(e')
= 1
+
(K,t+;K;+‘+-*)
+
(K,t+;K2t2+-.)2
+
. . .
l! 2!
=
1
+
Kit+
;(K2 + K;)t2
f...
.
Equating coefficients of powers of t leads to a series of formulas
KI
=
Plr
(8.44)
K2
=
CL2
-PL:,
(8.45)
K3
=
P3
-
3P1
F2
+&:,
(8.46)
K4
=
P4
-4WcL3
+
12&2
-3~;
-6p;,
(8.47)
KS
=
CL5
-5P1P4
+2opfp3
-
lop2p3
+
301~1
FL:
-
60~:~2 + 24~:~
(8.48)
defining the cumulants in terms of the moments. Notice that ~2 is indeed the
variance,
E(X’)
-
(EX)2,
as claimed.
Equation (8.41) makes it clear that the cumulants defined by the product
“For these higher
F(z) G (z) of two pgf’s will be the sums of the corresponding cumulants of F(z)
ha’f-invariants
we
and G(z), because logarithms of products are sums. Therefore all cumulants
shall
propose no
of the sum of independent random variables are additive, just as the mean and
special names.
-
T.
N.
Thiele
12881
variance are. This property makes cumulants more important than moments.
8.3 PROBABILITY GENERATING FUNCTIONS 385
If we take a slightly different tack, writing
G(l
+t)
=
1
+
%t+
zt'
+
$t'
+ ... ,
equation (8.33) tells us that the
K’S
are the “factorial moments”
-
Gimi(l)
OLm
1
x
Pr(X=k)kEzk-“’
lzz,
k20
=
xkzl?r(X=k)
k>O
=
E(X”).
(8.49)
It follows that
G(et) = 1 + y+(et
-
1) +
$(et
-
1)2
f..’
=
l+;!(t+ft2+...)+tL(t2+t3+...)+..
= 1
+er.,t+;(OL2+OL,)t2+..~,
and we can express the cumulants in terms of the derivatives
G’ml(l):
KI
=
011,
(8.50)
Q =
a2
+
011
-
c$,
(8.51)
K3 =
013
+ 3Q +
o(1
-
3cQoL1
-
34
+
24,
(8.52)
This sequence of formulas yields “additive” identities that extend (8.38) and
(8.39) to all the cumulants.
Let’s get back down to earth and apply these ideas to simple examples.
The simplest case
o’f
a random variable is a “random constant,” where X has
a certain fixed value x with probability 1. In this case Gx(z) = zx, and
In Gx(et) = xt; hence the mean is x and all other cumulants are zero. It
follows that the operation of multiplying any pgf by
zx
increases the mean
by x but leaves the variance and all other cumulants unchanged.
How do probability generating functions apply to dice? The distribution
of spots on one fair die has the pgf
z+z2+23+24+25+26
G(z)
=
-
6
=
zu6(z),
386 DISCRETE PROBABILITY
where
Ug
is the pgf for the uniform distribution of order 6. The factor
‘z’
adds 1 to the mean, so the
m’ean
is 3.5 instead of
y
= 2.5 as given in (8.35);
but an extra
‘z’
does not affect the variance (8.36), which equals
g.
The pgf for total spots on two independent dice is the square of the pgf
for spots on one die,
Gs(z)
=
z2+2z3+3z4+4z5+5z6+6z7+5z8+4~9+3~10+2~11+Z12
36
=
22u&)z.
If we roll a pair of fair dice n times, the probability that we get a total of
k spots overall is, similarly,
[zk]
Gs(z)” =
[zk]
zZnU~;(z)
2n
=
[zkp2y
u(;
(z)2n
In the hats-off-to-football-victory problem considered earlier, otherwise
Hat distribution is
known as the problem of enumerating the fixed points of a random permuta-
a
different
kind
of
tion, we know from (5.49) that the pgf is
uniform distribu-
tion.
F,(z)
=
t
(n?!
O<k<n
(n-k)! k!
for n 3 0.
\\
(8.53)
Therefore
F,!(z)
=
x
b
-
k)i Zk-’
,<k<n
(n-k)!
(k-l)!
\..
=
,<&-,
E3;
.
.
. .
=
F,pl(z).
Without knowing the details of the coefficients, we can conclude from this
recurrence FL(z) =
F,-,(z)
that
F~m’(z)
= F,-,(z); hence
FCml(l) =
F,-,(l)
=
[n>m].
n
(8.54)
This formula makes it easy to calculate the mean and variance; we find as
before (but more quickly) that they are both equal to 1 when n 3 2.
In fact, we can now show that the mth cumulant
K,
of this random
variable is equal to 1 whenever n 3 m. For the mth cumulant depends only
on
FL(l),
F:(l),
. . . .
Fim'(l),
and these are all equal to 1; hence we obtain
8.3 PROBABILITY GENERATING FUNCTIONS 387
Con artists know
that p
23
0.1
when you spin a
newly minted U.S.
penny
on a
smooth
table. (The weight
distribution makes
Lincoln’s head fall
downward.)
the same answer for the mth cumulant as we do when we replace F,(z) by
the limiting pgf
F,(z) =
e’-’
,
(8.55)
which has
FE’
( 1)
==
1 for derivatives of all orders. The cumulants of
F,
are
identically equal to
1,
because
lnF,(et)
= lneet-’ =
8.4 FLIPPING COINS
Now let’s turn to processes that have just two outcomes. If we flip
a coin, there’s probability p that it comes up heads and probability q that it
comes up tails, where
psq
= 1.
(We assume that the coin doesn’t come to rest on its edge, or fall into a hole,
etc.) Throughout this section, the numbers p and q will always sum to 1. If
the coin is fair, we have p = q =
i;
otherwise the coin is said to be biased.
The probability generating function for the number of heads after one
toss of a coin is
H(z) =
q+pz.
(8.56)
If we toss the coin n times, always assuming that different coin tosses are
independent, the number of heads is generated by
H(z)” = (q
+pz)”
=
x
(;)pkqn-*zk,
k>O
(8.57)
according to the binomial theorem. Thus, the chance that we obtain exactly k
k
n
heads in n tosses is
(i)
p q
~
k.
This sequence of probabilities is called the
binomial distribution.
Suppose we toss a coin repeatedly until heads first turns up. What is
the probability that exactly k tosses will be required? We have k = 1 with
probability p (since this is the probability of heads on the first flip); we have
k = 2 with probability qp (since this is the probability of tails first, then
heads); and for general k the probability is
qkm’p.
So the generating function
is
pz+qpz2+q=pz3+-
Pz
=
Gzqz’
(8.58)
388 DISCRETE PROBABILITY
Repeating the process until n heads are obtained gives the pgf
P=
n
(
)
-
=
w&
(n+;-yq,lk
1
-qz
This, incidentally, is
Z”
times
(&)”
=
;
(ni-;-l)p.,q’z*.
(8.60)
the generating function for the negative binomial distribution.
The probability space in example
(8.5g),
where we flip a coin until
n heads have appeared, is different from the probability spaces we’ve seen
earlier in this chapter, because it contains infinitely many elements. Each el-
ement is a finite sequence of heads and/or tails, containing precisely n heads
in all, and ending with heads; the probability of such a sequence is
pnqkpn,
Heads
I
win,
where k
-
n is the number of tails. Thus, for example, if n = 3 and if we
tails you lose.
write H for heads and T for tails, the sequence THTTTHH is an element of the
No? OK; tails you
probability space, and its probability is qpqqqpp =
p3q4.
lose, heads I win.
Let X be a random variable with the binomial distribution (8.57), and let
No?
Well,
then,
Y be a random variable with the negative binomial distribution (8.60). These
heads
you
,ose
tails
I
win.
distributions depend on n and p. The mean of X is nH’(l) = np, since its
pgf is
Hi;
the variance is
n(H”(1)+H’(1)-H’(1)2)
=
n(O+p-p2)
= npq.
(8.61)
Thus the standard deviation is
m:
If we toss a coin n times, we expect
to get heads about np
f
fitpq
times. The mean and variance of Y can be
found in a similar way: If we let
we have
G’(z) =
(,
T9sz,, ,
2pq2
G”(z)
= (, _
qz13
;
hence G’(1) =
pq/p2
= q/p and G”(1) =
2pq2/p3
=
2q2/p2.
It follows that
the mean of Y is nq/p and the variance is nq/p2.
8.4 FLIPPING COINS 389
A simpler way to derive the mean and variance of Y is to use the reciprocal
generating function
F(z)
=
l-q2
1 q
-
=
---2,
P
P P
(8.62)
and to write
G(z)” =
F(z)-“.
(8.63)
This polynomial F(z) is not a probability generating function, because it has
a negative coefficient. But it does satisfy the crucial condition F(1) =
1.
Thus
F(z)
is formally a binomial that corresponds to a coin for which we
The probability is
get heads with “probability” equal to -q/p; and G(z) is formally equivalent
negative that I’m
getting younger.
to flipping such a coin
-1 times(!). The negative binomial distribution
with parameters (n,p) can therefore be regarded as the ordinary binomial
Oh? Then it’s
>
1
that you’re getting
distribution with parameters (n’, p’) = (-n, -q/p). Proceeding formally,
older, or staying
the mean must be n’p’ = (-n)(-q/p) =
nq/p,
and the variance must be
the same.
n’p’q’ =
(-n)(-q/P)(l
+ 4/p) =
w/p
2.
This formal derivation involving
negative probabilities is valid, because our derivation for ordinary binomials
was based on identities between formal power series in which the assumption
0 6 p 6 1 was never used.
Let’s move on to another example: How many times do we have to flip
a coin until we get heads twice in a row? The probability space now consists
of all sequences of H’s and T's that end with HH but have no consecutive H’s
until the final position:
n
=
{HH,THH,TTHH,HTHH,TTTHH,THTHH,HTTHH,.
. .}.
The probability of any given sequence is obtained by replacing H by p and T
by q; for example, the sequence THTHH will occur with probability
Pr(THTHH) = qpqpp = p3q2.
We can now play with generating functions as we did at the beginning
of Chapter 7, letting S be the infinite sum
S
=
HH
+
THH + TTHH + HTHH + TTTHH + THTHH + HTTHH + . . .
of all the elements of fI. If we replace each H by pz and each T by qz, we get
the probability generating function for the number of flips needed until two
consecutive heads turn up.
390 DISCRETE PROBABILITY
There’s a curious relatio:n between S and the sum of domino tilings
in equation (7.1). Indeed, we obtain S from T if we replace each
0
by T and
each E by HT, then tack on an HH at the end. This correspondence is easy to
prove because each element of
n
has the form
(T
+
HT)"HH
for some n 3 0,
and each term of T has the form
(0
+ E)n. Therefore by (7.4) we have
s
=
(I-T-HT)-'HH,
and the probability generatin.g function for our problem is
G(z)
=
(1
-w-
(P~W-‘(PZ)Z
p*2*
= 1
-
qz-pqz* .
(8.64)
Our experience with the negative binomial distribution gives us a clue
that we can most easily calcmate the mean and variance of (8.64) by writing
where
F(z)
=
1
-
qz-pqz*
P2
and by calculating the “mean” and “variance” of this pseudo-pgf F(z). (Once
again we’ve introduced a function with F( 1) = 1.) We have
F’(1) =
(-q-2pq)/p*
=
2-p-l
-P-*;
F”(1) = -2pq/p* = 2
-
2pP’
.
Therefore, since
z*
= F(z)G(z),
Mean
= 2, and Var(z2) = 0, the mean
and variance of distribution G(z) are
Mean(G) = 2
-
Mean(F) =
pp2
+
p-l
;
(8.65)
Var(G) = -Va.r(F)
=
pP4
l
t&-3
-2~-*-~-1.
(8.66)
When p =
5
the mean and variance are 6 and 22, respectively. (Exercise 4
discusses the calculation of means and variances by subtraction.)
8.4 FLIPPING COINS 391
‘You really are
an
automaton-a
cal-
culating machine,
I
cried. ‘There is
something
positively
inhuman in you
at
times.“’
-J.
H.
Watson
(701
Now let’s try a more intricate experiment: We will flip coins until the
pattern THTTH is first obtained. The sum of winning positions is now
S = THTTH
+
HTHTTH + TTHTTH
+
HHTHTTH
+
HTTHTTH + THTHTTH + TTTHTTH + .
;
this sum is more difficult to describe than the previous one. If we go back to
the method by which we solved the domino problems in Chapter 7, we can
obtain a formula for S by considering it as a “finite state language” defined
by the following “automaton”:
The elementary events in the probability space are the sequences of H’s and
T’s that lead from state 0 to state 5. Suppose, for example, that we have
just seen THT; then we are in state 3. Flipping tails now takes us to state 4;
flipping heads in state 3 would take us to state 2 (not all the way back to
state 0, since the TH we’ve just seen may be followed by TTH).
In this formulation, we can let
Sk
be the sum of all sequences of H’s and
T’s that lead to state k: it follows that
so
=
l+SoH+SzH,
S1
= SoT+S,T+SqT,
S2
=
S,
H+ S3H,
S3
=
S2T,
S4
=
SST,
S5 =
S4
H.
Now the sum S in our problem is
S5;
we can obtain it by solving these six
equations in the six unknowns
SO,
S1,
. . . ,
Sg.
Replacing H by pz and T by qz
gives generating functions where the coefficient of
z”
in
Sk
is the probability
that we are in state k after n flips.
In the same way, any diagram of transitions between states, where the
transition from state j to state k occurs with given probability
pj,k,
leads to
a set of simultaneous linear equations whose solutions are generating func-
tions for the state probabilities after n transitions have occurred. Systems
of this kind are called Markov processes, and the theory of their behavior is
intimately related to the theory of linear equations.
392
DISCRETE PROBABILITY
But the coin-flipping problem can be solved in a much simpler way,
without the complexities of the general finite-state approach. Instead of six
equations in six unknowns
SO,
S,
, . , . ,
Ss,
we can characterize S with only
two equations in two unknowns. The trick is to consider the auxiliary sum
N =
SO
+
S1
+
SJ
+
S3
+
Sq
of all flip sequences that don’t contain any occur-
rences of the given pattern THTTH:
N =
1
+ H + T + HH + . . . + THTHT + THTTT + .
We have
l+N(H+T)
= N+S,
(8.67)
because every term on the left either ends with THTTH (and belongs to S) or
doesn’t (and belongs to N); conversely, every term on the right is either empty
or belongs to N H or N T. And we also have the important additional equation
NTHTTH = S+STTH, (8.68)
because every term on the left completes a term of S after either the first H
or the second H, and because every term on the right belongs to the left.
The solution to these two simultaneous equations is easily obtained: We
have N = (1
-
S)( 1
-
H
-
T)
from
(&X67),
hence
(1
-S)(l
-T-H) ‘THTTH =
S(1
+TTH).
As before, we get the probability generating function G(Z) for the number of
flips if we replace H by
p+q=l,andwefind
(1
-
G(z))p2q3t5
1-Z
hence the solution is
pz
and T by qz. A bit of simplification occurs since
=
G(z)(l
+pqV);
G(z)
=
p2q’;z5
p2q325 + (1
+pqV)(l
-
2)
(8.69)
Notice that G( 1) = 1, if pq # 0; we do eventually encounter the pattern
THTTH, with probability
1,
unless the coin is rigged so that it always comes
up heads or always tails.
To get the mean and variance of the distribution (8.6g), we invert G(z)
as we did in the previous problem, writing G(z) =
z5/F(z)
where F is a poly-
nomial:
F(z)
=
p2q3z5+
(1 +pq2z3)(1
-
2)
p2q3
(8.70)
8.4 FLIPPING COINS 393
The relevant derivatives are
F’(1) = 5
-
(1 +pq2)/p2q3,
F”(1) = 20
-
6pq2/p2q3
;
and if X is the number of flips we get
EX = Mean(G) = 5-Mean(F)
:=
pP2qm3
+pmlqP1;
VX = Var(G) =
-Var(F)
(8.71)
=
-25+pP2q
3
+ 7p ‘q--l +Mean(F)’
=
(EX)2
-9~~
2qP”
-
3pP’qm’
,
(8.72)
When p =
5,
the mean and variance are 36 and 996.
Let’s get general: The problem we have just solved was “random” enough
to show us how to analyze the case that we are waiting for the first appearance
of an arbitrary pattern A of heads and tails. Again we let S be the sum of
all winning sequences of H's and T’s, and we let N be the sum of all sequences
that haven’t encountered the pattern A yet. Equation (8.67) will remain the
same; equation (8.68) will become
NA =
s(l
+ A”) [A(“-‘,
=A,,_,,]
+
A(21
[A’m
2)
=A(,-
2,]
+.,.$-Aim
"[A~'-Ac,i]),
(8.73)
where m is the length of A, and where
ACkl
and Aiki denote respectively the
last k characters and the first k characters of A. For example, if A is the
pattern THTTH we just studied, we have
Ai” =
H,
Al21
=
TH,
Ai31
=
TTH,
Ai41
=
HTTH.
A,,, = T,
42, =
TH,
A(3)
= THT, A,,, =
THTT:
Since the only perfect match is
Ai21
= A
,l),
equation (8.73) reduces to (8.68).
Let A be the result of substituting
p-’
for H and
qm’
for T in the pat-
tern A. Then it is not difficult to generalize our derivation of (8.71) and (8.72)
to conclude (exercise 20) that the general mean and variance are
EX =
T
A/k,
[Alk)
=A/k,]
;
(8.74)
k=l
w
=
(EX)2
-
f
(2k-
l&k)
[ACk’
=A[k)] .
(8.75)
k=l
394 DISCRETE PROBABILITY
In the special case p =
i
we can interpret these formulas in a particularly
simple way. Given a pattern A of m heads and tails, let
A:A =
fIkpl
[Ack’
=A(kj] .
k=l
(8.76)
We can easily find the binary representation of this number by placing a ‘1’
under each position such that the string matches itself perfectly when it is
superimposed on a copy of itself that has been shifted to start in this position:
A
= HTHTHHTHTH
A:A=(1000010101)2=-512+16+4+l
=533
HTHTHHTHTH J
HTHTHHTHTH
HTHTHHTHTH
HTHTHHTHTH
HTHTHHTHTH
HTHTHHTH'TH J
HTHTHHTHTH
HTHTHHTHTH J
HTHTHHTHTH
HTHTHHTHTH J
Equation (8.74) now tells us that the expected number of flips until pattern A
appears is exactly 2(A:A), if we use a fair coin, because &kj = Ik when
p=q=$.
This result, first discovered by the Soviet mathematician A. D.
“Chem
bol’she
Solov’ev in 1966
[271],
seems paradoxical at first glance: Patterns with no
periodov
u
nasheg0
self-overlaps occur sooner
th,an
overlapping patterns do! It takes almost twice
s/ova,
tern
pozzhe
on0
poMl~ets%”
as long to encounter
HHHHH
as it does to encounter
HHHHT
or
THHHH.
-A.
D.
Solov’ev
Now let’s consider an amusing game that was invented by (of all people)
Walter Penney
[231]
in
196!3.
Alice and Bill flip a coin until either HHT or
HTT occurs; Alice wins if the pattern HHT comes first, Bill wins if HTT comes
first. This game-now called “Penney ante” -certainly seems to be fair, if
played with a fair coin, because both patterns HHT and HTT have the same
characteristics if we look at them in isolation: The probability generating
function for the waiting tim’e until HHT first occurs is
G(z) =
z3
z3
-
8(2-
1)
Of
w
not! Who
and the same is true for HTT. Therefore neither Alice nor Bill has an
advan-
could they have an
tage, if they play solitaire.
advantage over?
8.4 FLIPPING COINS 395
But there’s an interesting interplay between the patterns when both are
considered simultaneously. Let
SA
be the sum of Alice’s winning configura-
tions, and let Ss be the sum of Bill’s:
SA
=
HHT
+
HHHT +
THHT
+
HHHHT + HTHHT + THHHT
+
. . .
;
Ss =
HTT
+
THTT
+
HTHTT
+
TTHTT
+
THTHTT
+
TTTHTT
+ . . . .
Also- taking our cue from the trick that worked when only one pattern was
involved-let us denote by N the sum of all sequences in which neither player
has won so far:
N = 1
+H+T+HH+HT+TH+TT+HHH+HTH+THH+...
.
(8.77)
Then we can easily verify the following set of equations:
l+N(H+T)
=
NfS~f.5.s;
NHHT
=
SA
;
(8.78)
NHTT
= SATTS~.
If we now set H =
T
=
i,
the resulting value of
SA
becomes the probability
that Alice wins, and Ss becomes the probability that Bill wins. The three
equations reduce to
1
+N = N +sA +Ss;
;N
=
s,;
;N
=
$A
+sg;
and we find
SA
=
f
, Ss =
f
. Alice will win about twice as often as Bill!
In a generalization of this game, Alice and Bill choose patterns A and B
of heads and tails, and they flip coins until either A or B appears. The
two patterns need not have the same length, but we assume that A doesn’t
occur within B, nor does B occur within A. (Otherwise the game would be
degenerate. For example, if A =
HT
and B =
THTH,
poor Bill could never win;
and if A = HTH and B =
TH,
both players might claim victory simultaneously.)
Then we can write three equations analogous to (8.73) and (8.78):
1
+N(H+T)
=
N+SA+S~;
NA =
SA
i
A(lPkj [A
min(l,m)
(k’
=A(kj] +
sp,
x
A(lmk)
[Bck)
=Aiki];
k=l k=l
min(l,m)
NB =
SA
x
B
lrnpk’
[Atk’
=
B(k)]
+ Ss
5
BCmPk)
[Bck)
= B,,,] .
k=l
k=l
(8.79)
396 DISCRETE PROBABILITY
Here
1
is the length of A and m is the length of B. For example, if we have
A = HTTHTHTH and B = THTHTTH, the two pattern-dependent equations are
N HTTHTHTH = SA TTHTHTH + SA
+
Ss
TTHTHTH +
Ss
THTH
;
N THTHTTH = SA THTTH + SA TTH +
Ss
THTTH +
Ss
.
We obtain the victory probabilities by setting H = T =
i,
if we assume that a
fair coin is being used; this reduces the two crucial equations to
N
=
S/I
x
zk
]Alk’
=
A.(k)]
+ Ss
x
2k
[Bckl
=
Ackj]
;
k=l
k=l
(8.80)
N =SA
2k
[Alk)
= B,,,] + Ss
x
2k
[Bckl
=
B(k)]
.
k=l k=l
We can see what’s going on if we generalize the A:A operation of (8.76) to a
function of two independent strings A and B:
min(l,m)
A:B =
x
2kp’
[Alk’
=Bck,] .
k=l
Equations (8.80) now become simply
S*(A:A) + Ss(B:A) = S*(A:B) + Ss(B:B)
;
the odds in Alice’s favor are
SA
B:B
-
B:A
-
=
A:A-A:B
SB
(8.81)
(8.82)
(This beautiful formula was discovered by John Horton Conway
[ill].)
For example, if A = HTTHTHTH and B = THTHTTH as above, we have
A:A = (10000001)2 = 129, A:B = (0001010)2 = 10, B:A = (0001001)2 = 9,
and B:B = (1000010)2 = 66; so the ratio
SA/SB
is
(66-9)/(129-10)
= 57/l 19.
Alice will win this one only 57 times out of every 176, on the average.
Strange things can happen in Penney’s game. For example, the pattern
HHTH wins over the pattern HTHH with
3/2
odds, and HTHH wins over THHH with
7/5 odds. So HHTH ought to ‘be much better than THHH. Yet THHH actually wins
over HHTH, with 7/5 odds!
‘The
relation between patterns is not transitive. In
Odd, odd.
fact, exercise 57 proves that if Alice chooses any pattern
ri
~2
. .
~1
of length
1
3 3, Bill can always ensure better than even chances of winning if he chooses
the pattern ;S2rlr2 . . .
~1~1,
where
?2
is the heads/tails opposite of
~2.
398 DISCRETE PROBABILITY
For example, suppose
,the
keys are names, and suppose that there are
m = 4 lists based on the first letter of a name:
1
1,
for ,4-F;
h(name) =
2, for G-L;
3, for M-R;
4, for
!3-Z.
We start with four empty lists and with n = 0. If, say, the first record has
Nora as its key, we have h(Nora) = 3, so Nora becomes the key of the first
item in list 3. If the next two names are Glenn and Jim, they both go into
list 2. Now the tables in memory look like this:
FIRST[l] =
-1,
FIRST[2] = 2, FIRST
[31
=
1,
FIRST
[41
=
-1
KEY
Cl1
= Nora,
NEXT[l1
= 0;
KEY
[21
= Glenn,
NEXTC21
= 3;
KEY
[31
= Jim,
NEXTC31
= 0; n = 3.
(The values of DATA
[ll
,
DATA[21,
and
DATAC31
are confidential and will not
be shown.) After 18 records have been inserted, the lists might contain the
Let’s hear it for
names
the Concrete Math
students who sat in
list 1 list 2 list 3 list 4
the front rows and
lent their names to
Dianne
Ari
Brian
Fran
Doug
Glenn
Jim
Jennifer
Joan
Jerry
Jean
Nora
Mike
Michael
Ray
Paula
Scott
Tina
this experiment.
and these names would appear intermixed in the KEY array with NEXT entries
to keep the lists effectively separate. If we now want to search for John, we
have to scan through the six names in list 2 (which happens to be the longest
list); but that’s not nearly as bad as looking at all 18 names.
Here’s a precise specification of the algorithm that searches for key K in
accordance with this scheme:
Hl
Set i := h(K) and j := FIRSTCil.
H2 If j 6 0, stop. (The search was unsuccessful.)
H3 If KEY Cjl = K, stop. (The search was successful.)
H4 Set i := j, then set j
:=
NEXTCi]
and return to step H2. (We’ll try again.)
For example, to search for Jennifer in the example given, step Hl would set
i
:=
2 and j
:=
2; step H3
,would
find that Glenn # Jennifer; step H4 would
1
bet their parents
set j := 3; and step H3 would find Jim # Jennifer.
are glad about that.
8.5 HASHING 399
After a successful search, the desired data D(K) appears in
DATA
[jl
, as in
the previous algorithm. After an unsuccessful search, we can enter K and D(K)
in the table by doing the following operations:
n :=
n+l;
if j < 0 then
FIRSTCil
:=n else
NEXT[il
:=n;
KEYCn.1
:=
K;
DATACnl
:= D(K);
NEXT[n]
:= 0.
(8.83)
Now the table will once again be up to date.
We hope to get lists of roughly equal length, because this will make the
task of searching about m times faster. The value of m is usually much greater
than 4, so a factor of l/m will be a significant improvement.
We don’t know in advance what keys will be present, but it is generally
possible to choose the hash function h so that we can consider h(K) to be a
random variable that is uniformly distributed between 1 and m, independent
of the hash values of other keys that are present. In such cases computing the
hash function is like rolling a die that has m faces. There’s a chance that all
the records will fall into the same list, just as there’s a chance that a die will
always turn up q ; but probability theory tells us that the lists will almost
always be pretty evenly balanced.
Analysis of Hashing: Introduction.
“Algorithmic analysis” is a branch of computer science that derives quan-
titative information about the efficiency of computer methods. “Probabilistic
analysis of an algorithm” is the study of an algorithm’s running time, con-
sidered as a random variable that depends on assumed characteristics of the
input data. Hashing is an especially good candidate for probabilistic analysis,
because it is an extremely efficient method on the average, even though its
worst case is too horrible to contemplate. (The worst case occurs when all
keys have the same hash value.) Indeed, a computer programmer who uses
hashing had better be a believer in probability theory.
Let P be the number of times step H3 is performed when the algorithm
above is used to carry out a search. (Each execution of H3 is called a “probe”
in the table.) If we know P, we know how often each step is performed,
depending on whether the search is successful or unsuccessful:
Step Unsuccessful search
Hl 1 time
H2 P + 1 times
H3 P times
H4 P times
Successful search
1 time
P times
P times
P
-
1 times
400 DISCRETE PROBABILITY
Thus the main quantity that governs the running time of the search procedure
is the number of probes, P.
We can get a good mental picture of the algorithm by imagining that we
are keeping an address book that is organized in a special way, with room for
only one entry per page. On the cover of the book we note down the page
number for the first entry in each of m lists; each name K determines the list
h(K) that it belongs to.
Every
page inside the book refers to the successor
page in its list. The number of probes needed to find an address in such a
book is the number of pages we must consult.
If n items have been inserted, their positions in the table depend only
on their respective hash
val.ues,
(h’ , hz, . . . , &). Each of the m” possible
sequences (h’ , h2, . . . ,
&)
is considered to be equally likely, and P is a random
variable depending on such a sequence.
Case 1: The key is not present.
Check under the
Let’s consider first the behavior of P in an unsuccessful search, assuming
doormat.
that n records have previously been inserted into the hash table. In this case
the relevant probability
spac:e
consists of
mn+’
elementary events
w
= (h’,hz,...,h,;hT,+‘)
where
b
is the hash value of the jth key inserted, and where &+’ is the hash
value of the key for which the search is unsuccessful. We assume that the
hash function h has been chosen properly so that
Pr(w)
= 1 /mnf’ for every
such
CU.
For example, if m = n
==
2, there are eight equally likely possibilities:
hl
h2
h3:
P
11
1:2
11 2:o
1 2
1:l
1
2 2: 1
2 1
1:l
2 1
2:l
2 2
1:o
2 2
212
If h’ =
h2
=
h3
we make two unsuccessful probes before concluding that the
new key K is not present; if h’ =
h2
#
h3
we make none; and so on. This list
of all possibilities shows that P has a probability distribution given by the pgf
(f +
$2
+
$2’)
= (i + iz)‘, when m = n = 2.
An unsuccessful search makes one probe for every item in list number
h
n+‘,
so we have the general formula
P =
[h,
=hm+,l + [hz=hn+,l + ... + [h,,=hn+ll.
(8.84)
8.5 HASHING 401
The probability that
hi
= hn+l is 1
/m,
for 1 < j 6 n; so it follows that
EP =
E[hl=~+l]+E[h~=hh,+,]+...tE[h,=h,+,]
=
;.
Maybe we should do that more slowly: Let
Xj
be the random variable
ThenP=X1+...+X,,andEXj=l/mforallj<n;hence
EP =
EXl+...+EX,
=
n/m.
Good: As we had hoped, the average number of probes is
l/m
times what it
was without hashing. Furthermore the random variables
Xj
are independent,
and they each have the same probability generating function
Xj(Z) =
m-l+2
1
m
therefore the pgf for the total number of probes in an unsuccessful search is
P(z) =
Xl
(2). .
.X,(z)
=
(m-;+z)“.
This is a binomial distribution, with p
-=
l/m and q = (m
-
1)/m;
in other
words, the number of probes in an unsuccessful search behaves just like the
number of heads when we toss a biased coin whose probability of heads is
l/m on each toss. Equation (8.61) tells us that the variance of P is therefore
n(m- 1)
npq =
mz *
When m is large, the variance of P is approximately n/m, so the standard
deviation is approximately
fi.
Case 2: The key is present.
Now let’s look at successful searches. In this case the appropriate proba-
bility space is a bit more complicated, depending on our application: We will
let
n
be the set of all elementary events
w = (h
,,...,
h,;k),
(8.86)
where
hj
is the hash value for the jth key as before, and where k is the index
of the key being sought (the key whose hash value is hk). Thus we have
1 6
hj
< m for 1 6 j < n, and 1 < k 6 n; there are
rn”.
n elementary
events w in all.
402 DISCRETE PROBABILITY
Let
sj
be the probability that we are searching for the jth key that was
inserted into the table. Then
Pr(w) =
sk/mn
(8.87)
if w is the event (8.86). (Some applications search most often for the items
that were inserted first, or for the items that were inserted last, so we will not
assume that each
Sj
= l/n.) Notice that
,&l
Pr(w) =
Et=,
sk
=
1,
hence
(8.87) defines a legal probability distribution.
The number of probes P in a successful search is p if key K was the pth
key to be inserted into its hst. Therefore
P = [h, =
h-k]
+ [hz = hkl + . . . + [hk
=hkl
;
or, if we let
Xj
be the random variable [hj = hk], we have
P
= x1
+&
+
“‘+xk.
(8.88)
Suppose, for example, that we have m = 10 and n = 16, and that the hash
values have the following “random” pattern:
Where
have I
seen
that pattern before?
(h-l,...,
h,6)=3
141592653589793;
(Pl,.
.
*,
P,~)=1112111122312133.
The number of probes
Pj
needed to find the jth key is shown below hi.
Equation (8.88) represents P as a sum of random variables, but we can’t
simply calculate EP as EX,
$-.
. .+EXk because the quantity k itself is a random
variable. What is the probability generating function for P? To answer this
question we should digress
#a
moment to talk about conditional probability.
Equation
(8.43) was
If A and B are events in a probability space, we say that the conditional
a1so
a
momentary
probability of A, given B, is
digression.
F’r(cu
g
A n B)
Pr(wEAIwEB)
=
-
Pr(wCB)
For example, if X and Y are random variables, the conditional probability of
the event X = x, given that Y = y, is
Pr(X=x and
Y=y)
Pr(X=xlY=y)
=
-
Pr(Y=y)
(8.90)
For any fixed y in the range of Y, the sum of these conditional probabil-
ities over all x in the range of X is Pr(Y
=y)/Pr(Y
=y) = 1; therefore (8.90)
defines a probability distribution, and we can define a new random variable
‘X/y’ such that Pr(Xly
=x)
= Pr(X =x 1 Y =y).
8.5 HASHING
303
If X and Y are independent, the random variable Xly will be essentially
the same as X, regardless of the value of y, because Pr(X = x 1 Y = y ) is equal
to Pr(X
=x)
by (8.5); that’s what independence means. But if X and Y are
dependent, the random variables X/y and Xly’ need not resemble each other
inanywaywheny#y’.
If X takes only nonnegative integer values, we can decompose its pgf into
a sum of conditional pgf’s with respect to any other random variable Y:
Gx(z)
=
x
WY=y)Gx,(z).
YEYIf~l
(8.91)
This holds because the coefficient of
zx
on the left side is Pr(X
=x),
for all
x
E
X(n), and on the right it is
x
Pr(Y=y)Pr(x=xIY=y)
=
t
Pr(X=x and
Y=y)
YEytni YEYin)
= Pr(X=x).
For example, if X is the product of the spots on two fair dice and if Y is the
sum of the spots, the pgf for
X16
is
Gx,6(z) = +z5 + $z8 + ;z9
because the conditional probabilities for Y = 6 consist of five equally probable
events
{
q
m,
q
n,
q
m,
q
n,
q
m}.
Equation (8.91) in this case
reduces to
Gx(z)
=
$x
2(z)
+
$x,3(z)
+
&Gx
z,(z)
+
$x,5(z)
j$x,dz)
+
$&T(Z)
+
j$x,a(z)
+
&Gx~9(4
$%Io(z)
+
j$x,,,
(~1
+
&12(z),
Oh, now
1
un-
de&and what
mathematicians
mean when they
say something is
“‘obvious,” “clear,”
or “trivial.”
a formula that is obvious once you understand it. (End of digression.)
In the case of hashing, (8.91) tells us how to write down the pgf for probes
in a successful search, if we let X = P and Y = K. For any fixed k between 1
and n, the random variable
PI
k is defined as a sum of independent random
variables
X1
+ . . . +
Xk;
this is (8.88).
so
it has the pgf
Gp,k(Z) =
(m-;+Z)k-‘Z.
404 DISCRETE PROBABILITY
Therefore the pgf for P itself is
GP(z)
=
‘&~GP,I;(z’I
k=l
=
2s
(
n-l+2
~-
>
m
where
(8.92)
S(z)
=
Sl
+
s2z
+
s&
+
.
.
.
+
S,P’
(8.93)
is the pgf for the search probabilities
sk
(divided by
z
for convenience).
Good. We have a probability generating function for P; we can now find
the mean and variance by differentiation. It’s somewhat easier to remove the
z
factor first, as we’ve done before, thus finding the mean and variance of
P
-
1 instead:
F(z)
=
Gp(z),‘z
=
S(m-m+f)
;
F’(z)
=
;S’(+)
;
F”(z) =
-&“(!+)
.
Therefore
EP = 1 + Mean(F) = 1 + F’( 1) = 1 + m-’ Mean(S) ;
(8.94)
VP =
Var(F)
=
F"(l)+F'(l)-F'(l)'
= rn-‘S”(1) +m-‘S’(1) -m~2S’(1)2
=
rnp2
Va.r(S)
+
(rn-’
-
m-*) Mean(S).
(8.95)
These are general
formula,s
expressing the mean and variance of the num-
ber of probes P in terms
‘of
the mean and variance of the assumed search
distribution S.
For example, suppose we have
sk
= l/n for 1 6 k 6 n. This means
we are doing a purely
“ran.dom”
successful search, with all keys in the table
equally likely. Then
S(z)
is the uniform probability distribution U,(z) in
8.5 HASHING 405
(8.32),
and we have Mean(S) = (n-
1)/2,
Var(S)
=
(n2
-
1)/12.
Hence
n2-1
(m-l)(n-1)
=~
(n-1)(6m+n-5)
VP==+
-Jm2
12m2
(8&v)
Once again we have gained the desired speedup factor of 1 /m. If m =
n/inn
and n
+
00,
the average number of probes per successful search in this case
is about
i
Inn, and the standard deviation is asymptotically
(Inn)/&!.
On the other hand, we might suppose that
sk
= (kH,))’ for 1 6 k 6 n;
this distribution is called “Zipf’s law!’ Then Mean(G) = n/H,, and Var( G) =
in(n
+ 1)/H,,
-
n’/Hi.
The average number of probes for m =
n/inn
as
n
+
oo
is approximately 2, with standard deviation asymptotic to
G/d.
In both cases the analysis allows the cautious souls among us, who fear
the worst case, to rest easily: Chebyshev’s inequality tells us that the lists
will be nice and short, except in extremely rare cases.
Case 2, continued: Variants of the variance.
We have just computed the variance of the number of probes in a success-
ful search, by considering P to be a random variable over a probability space
with
mn.n
elements
(h,,
. . .
,
hn;
k). But we could have adopted another point
OK, gang, time
of view: Each pattern
(h,
, . . . , h,) of hash values defines a random variable
to put
on your
skim suits again.
P/h,...
, h,), representing the probes we make in a successful search of a
-Friendly TA
particular hash table on n given keys. The average value of
PI
(h, , . . . , h,),
A(h,, . . .
,&I
=
~p.Pr(Pl(hl,...,h,)=p),
(8.98)
p=l
can be said to represent the running time of a successful search. This quantity
A(h,, . . . , h,) is a random variable that depends only on (h, , . . . , h,), not on
the final component k; we can write it in the form
A(h,,...
,hn) =
$kPb,,...,hn;k),
k=l
since
P/(hl,...
, h,) = p with probability
~~=,
Pr(P(hl,...
,h,;k)=p)
= xE=, m nsk[P(hl,...
,h,;k)=p]
~~=,
Prh
, . . . , hn; k)
~~=,
m
nSk
=
fsk[P(h
I,...,
h,;k)=p].
k=l
406 DISCRETE PROBABILITY
The mean value of A(hl , . . . ,
&),
obtained by summing over all
m”
pos-
sibilities (hl , . . . ,
&)
and dividing by mn, will be the same as the mean value
we obtained before in (8.g4), But the variance of A(hl , ,
h,)
is something
different; this is a variance of
mn
averages, not a variance of
m”
.n
probe
counts. For example, if m
==
1 (so that there is only one list), the “average”
value A(hl, . . .
,&)
=
A(l).
. . , 1) is actually constant, so its variance VA is
zero; but the number of probes in a successful search is not constant, so the
variance VP is nonzero.
But the VP is
We can illustrate this difference between variances by carrying out the
nonzero
“‘yin
an
calculations for general m and n in the simplest case, when
sk
= l/n for
election year.
1 < k 6 n. In other words, we will assume temporarily that there is a uniform
distribution of search keys. Any given sequence of hash values (h, , . ,
h)
defines m lists that contain respectively (n,
,nz,
. . .
,n,)
entries for some
numbers ni, where
nl+n2+...+n,
= n.
A successful search in which each of the n keys in the table is equally likely
will have an average running time of
(l+...+nl)
+
(l+...+nz)
+...+
(l+...+n,)
A(h,,...,h,)
=
-
n
nl
(nlfl)
+ nz(n2+1) + . . +
n,(n,+l)
=-
2n
n:+n:+...+&+n
zz-
2n
probes. Our goal is to calculate the variance of this quantity A(hl , . . . ,
&),
over the probability space
cionsisting
of all
m”
sequences
(hl
, . . . ,
h,).
The calculations will be simpler, it turns out, if we compute the variance
of a slightly different quantity,
B(h,,...,h,)
=
(?‘)+(T)+...+(y).
We have
A(h,,
. . .
,G
=
1
+B(h,...,h,)/n,
hence the mean and
varianc:e
of A satisfy
EA =
1,;;
VA =
$.
(8.99)
8.5 HASHING 407
The probability that the list sizes will be nl ,
n2,
. . . ,
n,
is the multinomial
coefficient
(
n
>
n!
=
nl,nz,...,n,
nl!n2! .
..n.!
divided by mn; hence the pgf for B( hl , . . . , h,) is
B,(z)
=
=
(
n
>
J;‘)+(;‘)+-+(“jq m-n.
n1
,n2
,....n,>o
nl,nz,...,n,
n,
+n2
t...+n,=n
This sum looks a bit scary to inexperienced eyes, but our experiences in
Chapter 7 have taught us to recognize it as an m-fold convolution. Indeed, if
we consider the exponential super-generating function
G(w,z)
=
~Bn,z,~,
n20
we can readily verify that G (w,
z)
is simply an mth power:
As a check, we can try setting
z
= 1; we get G(w, 1) = (ew)m, so the coefficient
of m”w”/n! is
B,
(1)
= 1.
If we knew the values of
B,/,
(1) and
Bt
(1))
we would be able to calculate
Var(B,). So we take partial derivatives of
G(w,
z)
with respect to
z:
&G(w,z) =
LB:,(z)?
7x30
=
m(&z(:)
%)me’
5
(i)Z(‘)Pl
$;
,
/
&
w,z) =
xB;(z)y
3x30
408 DISCRETE PROBABILITY
Complicated, yes; but everything simplifies greatly when we set
z
=
1.
For
example, we have
t
B;(l)y
zz
m,+m
‘)y
Wk
n30
k>2
2(k
-
2)!
/
= me{”
‘)w
lx
wkf2
--
k>O
2k!
,,z,im
llw
1Y
E--e
=
2
x
(
mw)n+2
n(n-l)m”wn
2mn! =
IL
TX30
II30
2mn!
and it follows that
The expression for EA in
(8.!3g)
now gives EA = 1 + (n-
1)/2m,
in agreement
with (8.96).
The formula for Bz (1) involves the similar sum
5
(;) (G>-,)
g
=
f
&
(k+
‘)k(k;;)(-“
/
,
hence we find that
= mewm(+mw4 + w”)
;
B;(l)
=
(;:)((1)
-l>s*
(8.101)
Now we can put all the pieces together and evaluate the desired variance VA.
Massive cancellation occurs, and the result is surprisingly simple:
B”(1) +
B;(l)
-
B;(1)2
VA+L
n2
=--
(n+l)(n-2)
+-m-n(n-1)
4
2 4
(m-
l)i:n-
1)
=-
2mln
(8.102)
8.5 HASHING 409
When such “coincidences” occur, we suspect that there’s a mathematical
reason; there might be another way to attack the problem, explaining why
the answer has such a simple form. And indeed, there is another approach (in
exercise 60), which shows that the variance of the average successful search
has the general form
VA =
k=l
(8.103)
when
sk
is the probability that the kth-inserted element is being sought.
Equation (8.102) is the special case
sk
= l/n for 1 < k 6 n.
Besides the variance of the average, we might also consider the average of
the variance. In other words, each sequence (hl , . . . ,
hn)
that defines a hash
table also defines a probability distribution for successful searching, and the
variance of this probability distribution tells how spread out the number of
probes will be in different successful searches. For example, let’s go back to
Where have
1
seen
the case where we inserted n = 16 things into m = 10 lists:
that pattern before?
Where have
1
seen
h
, . . .
,h16)=3
141592653589793
that grafito before?
IqvP,
.
Pl,
*
*.
,P,6)=1112111122312133
A successful search in the resulting hash table has the pgf
G(3,1,4,1,...
,3)
=
f
SkZP(3,1,4,1,...,3;k)
k=l
=
SlZ+S2Z+S3Z+S4Z2+...+S~~Z3.
We have just considered the average number of probes in a successful search
of this table, namely
A(3,1,4,1,.
. . ,3) =
Mean(G(3,1,4,1,.
. .
,3)).
We can
also consider the variance,
This variance is a random variable, depending on (hl , . . . , h,), so it is natural
to consider its average value.
In other words, there are three natural kinds of variance that we may
wish to know, in order to understand the behavior of a successful search: The
overuZZ
variance of the number of probes, taken over all (h1,. , . , h,,) and k;
the variance of the average number of probes, where the average is taken
over all k and the variance is then taken over all (h, , . . . , h,,); and the average
of the variance of the number of the probes, where the variance is taken over
410 DISCRETE PROBABILITY
all k and the average is
the:n
taken over all (hi,. , h,). In symbols, the
overall variance is
Vf’
=
t
-+J(h,,...,h,;k)’
l<h,
,...,
h,,$m
k=l
-(
t
f-$Ph.....hn;k))2;
l$h
I,...,
h,,<m
k=l
the variance of the average is
and the average of the variance is
AV =
SkP(h,,...,h,;k)2
lsh
,I..,,
h,,$m
I
n
\
21
SkP(h,,...,h,;k)
It turns out that these three quantities are interrelated in a simple way:
VP =
VA+AV.
(8.104)
In fact, conditional probability distributions always satisfy the identity
VX
= V(E(XlY)) + E(V(XlY))
(8.105)
if X and Y are random variables in any probability space and if X takes real
values.
(This identity is proved in exercise 22.) Equation (8.104) is the
special case where X is the number of probes in a successful search and Y is
the sequence of hash values (hl , . . . , h,).
The general equation (8.105) needs to be understood carefully, because
the notation tends to conceal the different random variables and probability
spaces in which expectations and variances are being calculated. For each y
in the range of Y, we have defined the random variable Xly in
(~.Qo),
and this
random variable has an expected value E(Xly) depending on
y.
Now E(XlY)
denotes the random variable whose values are E(
Xl
y
)
as y ranges over all
8.5 HASHING 411
possible values of Y, and V(E(XlY)) is th
e
variance of this random variable
[Now is a good
with respect to the probability distribution of Y. Similarly, E(V(XlY)) is the
time to do warmup
exercise 6.)
average of the random variables V(Xly) as y varies. On the left of (8.105)
is
VX,
the unconditional variance of X. Since variances are nonnegative, we
always have
vx 3
V(EW’))
and VX 3
E(V(XlY)).
(8.106)
Case 1, again: Unsuccessful search revisited.
Let’s bring our microscopic examination of hashing to a close by doing one
more calculation typical of algorithmic analysis. This time we’ll look more
closely at the total running time associated with an unsuccessful search,
assuming that the computer will insert the previously unknown key into its
memory.
P is still the num-
ber of probes.
The insertion process in (8.83) has two cases, depending on whether j is
negative or zero. We have j < 0 if and only if P = 0, since a negative value
comes from the FIRST entry of an empty list. Thus, if the list was previously
empty, we have P = 0 and we must set FIRSTC&+,l := n + 1. (The new
record will be inserted into position n + 1.) Otherwise we have P > 0 and we
must set a LINK entry to n +
1.
These two cases may take different amounts
of time; therefore the total running time for an unsuccessful search has the
form
T = a+pP$-6[P=O],
(8.107)
where
OL,
fi,
and 6 are constants that depend on the computer being used and
on the way in which hashing is encoded in that machine’s internal language.
It would be nice to know the mean and variance of T, since such information
is more relevant in practice than the mean and variance of P.
So far we have used probability generating functions only in connection
with random variables that take nonnegative integer values. But it turns out
that we can deal in essentially the same way with
Gx(z) =
t
Pr(w)zx(wi
wcn
when X is any real-valued random variable, because the essential characteris-
tics of X depend only on the behavior of Gx near z =
1,
where powers of z are
well defined. For example, the running time (8.107) of an unsuccessful search
is a random variable, defined on the probability space of equally likely hash
values (h1,. . , ,
h,;
h,+l
) with 1 6
hj
6 m; we can consider the series
GT(z)
=
&i
f...f
f
Z”+PPlhl
,...,
hn;hn+l)+6P(hl
a...>
hn;hn+l
I=01
h, =l
h,=l
h,+,=l
412 DISCRETE PROBABILITY
to be a pgf even when
01,
(3,
and 6 are not integers. (In fact, the parameters
a,
(3,
6 are physical quantitieis that have dimensions of time; they aren’t even
pure numbers! Yet we can use them in the exponent of 2.) We can still
calculate the mean and variance of T, by evaluating G;( 1) and Gf’( 1) and
combining these values in the usual way.
The generating function for P instead of T is
P(z)
=
(
m-l+z)n
q =
xPr(P=p)zP
P>O
Therefore we have
=2
a((~6-1)Pr’~P=O)+~Pr(P=p)zBP)
P20
The determination of
Mean
and
Var(G’)
is now routine:
Mean
= Gf(1) =
a+pt
+6(y)n;
Gt’(l) =
a(a-l)i-2ap~+B(B-l)~+lJ
2n(n-
1)
m2
+2a6(~)“+b(h-l)(q)“;
1)
-G;(l)’
V=(G)
= Gf’(l) +
Gf-(
=
2n(m-
1)
8
~-
m2
-24gn;
(8.108)
+b2((v)“--
(%)‘“).
(8.109)
In Chapter 9 we will
le’arn
how to estimate quantities like this when
m and n are large. If, for example, m = n and n
+
00, the techniques
of Chapter 9 will show that the mean and variance of T are respectively
oL+@+6e~‘+O(n~‘) and
~2--2@6ee’+62(e~‘-e~2)+O(n~‘).
Ifm =
n/inn
and n -+
00
the corresponding results are
Mean
=
(31nn+a+6/n+O((logn)2/n2);
Var(G’) = (S21nn-
((/31nn)2+2~61nn-62)/n+O((logn)3/n2),
8 EXERCISES 413
Exercises
Warmups
1
What’s the probability of doubles in the probability distribution
Pro,
of
(8.3),
when one die is fair and the other is loaded? What’s the proba-
bility that S = 7 is rolled?
2
What’s the probability that the top and bottom cards of a randomly shuf-
fled deck are both aces? (All
52!
permutations have probability l/52!.)
3
Stanford’s Concrete Math students were asked in 1979 to flip coins until
Why only ten they got heads twice in succession, and to report the number of flips
numbers?
required. The answers were
The other students
either weren’t
3, 2, 3, 5,
‘IO,
2, 6, 6, 9, 2.
empiricists or
they were just too
Princeton’s
Co:ncrete
Math students were asked in 1987 to do a similar
Aipped out.
thing, with the following results:
10, 2, 10, 7, 5, 2, 10, 6, 10, 2.
Estimate the mean and variance, based on (a) the Stanford sample;
(b) the Princeton sample.
4
Let H(z) =
F(z)/G(z),
where F(1) = G(1) = 1. Prove that
Mean(H) = Mean(F) -Mean(G),
Var(H) =
Var(F)
-
Var(G)
,
in analogy with (8.38) and
(8.3g),
if the indicated derivatives exist at
z=
1.
5
Suppose Alice and Bill play the game (8.78) with a biased coin that comes
up heads with probability p. Is there a value of p for which the game
becomes fair?
6
What does the conditional variance law (8.105) reduce to, when X and Y
are independent random variables?
Basics
7
Show that if two dice are loaded with the same probability distribution,
the probability of doubles is always at least
i.
8
Let A and B be events such that A U B =
f2.
Prove that
Pr(wEAClB) =
Pr(wEA)Pr(wEB)-Pr(w$A)Pr(w$B).
9
Prove or disprove: If X and Y are independent random variables, then so
are F(X) and G(Y), when F and G are any functions.
414 DISCRETE PROBABILITY
10
What’s the maximum number of elements that can be medians of a ran-
dom variable X, according to definition (8.7)?
11
Construct a random variable that has finite mean and infinite variance.
12 a If P(z) is the pgf for the random variable X, prove that
Pr(X $ r) < x.~‘P(x) for 0 <
x
< 1;
Pr(X 3 r) 6 x. -‘P(x)
for x 3 1.
(These important relations are called the tail inequalities.)
b In the special case P(z) = (1
+~)“/2~,
use the first tail inequality to
prove that
t
k,,,(z)
6
l/xan(l
-
CX)~'-~)~
when 0 <
OL<
i.
13
IfX,,
.
..)
Xln are inde:pendent random variables with the same distri-
bution, and if
(x
is any real number whatsoever, prove that
pr
(1
x1+...+xzn
2n
o1
<
X1+-'fX,-K
IL1
1)
3
1
n
2'
14 Let F(z) and G(z) be probability generating functions, and let
H(z) = pF(z) +
q
G(z)
where p + q =
1.
(This is called a miztzlre of F and G; it corresponds to
flipping a coin and choosing probability distribution F or G depending on
whether the coin comes up heads or tails.) Find the mean and variance
of H in terms of
p,
q,
and the mean and variance of F and G.
15 If F(z) and G(z) are probability generating functions, we can define an-
other pgf H(z) by “composition”:
H(z) =
F(G(z)).
Express Mean(H) and Var(H) in terms of Mean(F), Var(F), Mean(G),
and Var(G). (Equation (8.92) is a special case.)
16 Find a closed form for the super generating function
En20
Fn(z)wn,
when F,(z) is the football-fixation generating function defined in (8.53).
17 Let
X,,,
and
Yn,p
have the binomial and negative binomial distributions,
respectively, with parameters (n, p). (These distributions are defined in
(8.57) and (8.60).) Prove that
Pr(Y,,,
<m)
= Pr(Xm+n,p
an).
What
identity in binomial
coe,fficients
does this imply?
18 A random variable X is said to have the Poisson distribution with
The distribution of
mean
k
if Pr(X= k) = eeppk/k! for all k 3 0.
fish
per unit volume
a
What is the pgf of such a random variable?
of water.
b What are its mean, variance, and other cumulants?
8 EXERCISES 415
19 Continuing the previous exercise, let
X1
be a random Poisson variable
with mean
~1,
and let
XZ
be a random Poisson variable with mean
~2,
independent of
X1.
a
What is the probability that
X1
+
X2
= n?
b
What are
t.he
mean, variance, and other cumulants of 2x1 + 3X2?
20 Prove (8.74) and (8.75), the general formulas for mean and variance of
the time needed to wait for a given pattern of heads and tails.
21 What does the value of N represent, if H and T are both set equal to
i
in (8.77)?
22 Prove (8.105)~ the law of conditional expectations and variances.
Homework exercises
23 Let
Pro0
be the probability distribution of two fair dice, and let
Prll
be
the probability distribution of two loaded dice as given in (8.2). Find all
events A such that
Proo(A)
=
Prll
(A). Which of these events depend
only on the random variable S? (A probability space with
n
=
D2
has
236
events; only 2 of those events depend on S alone.)
24 Player J rolls
2n+
1 fair dice and removes those that come up q . Player
K then calls a number between 1 and 6, rolls the remaining dice, and
removes those that show the number called. This process is repeated
until no dice remain. The player who has removed the most total dice
(n + 1 or more) is the winner.
a
What are the mean and variance of the total number of dice that
J removes? Hint: The dice are independent.
b
What’s the probability that J wins, when n = 2?
25 Consider a gambling game in which you stake a given amount A and you
roll a fair die. If k spots turn up, you multiply your stake by 2(k
-
1)/5.
(In particular, you double the stake whenever you roll q , but you lose
everything if you roll q .) You can stop at any time and reclaim the
current stake. What are the mean and variance of your stake after n rolls?
(Ignore any effects of rounding to integer amounts of currency.)
26 Find the mean and variance of the number of L-cycles in a random permu-
tation of n elements. (The football victory problem discussed in (8.23),
(8.24), and (8.53) is the special case
1
= 1.)
27 Let
X1,
X,7,
. . . , X, be independent samples of the random variable X.
Equations (8.19) and (8.20) explain how to estimate the mean and vari-
ance of X on the basis of these observations; give an analogous formula
for estimating the third cumulant ~3. (Your formula should be an “un-
biased” estimate, in the sense that its expected value should be KS.)
416 DISCRETE PROBABILITY
28 What is the average length of the coin-flipping game (8.78)
a
given that Alice wins?
b given that Bill wins?
29 Alice, Bill, and Computer flip a fair coin until one of the respective
patterns A = HHTH, B
:=
HTHH, or C = THHH appears for the first time.
(If only two of these patterns were involved, we know from (8.82) that A
would probably beat B, that B would probably beat C, and that C would
probably beat A; but all three patterns are simultaneously in the game.)
What are each player’s chances of winning?
30 The text considers three kinds of variances associated with successful
search in a hash table. Actually there are two more: We can consider the
average (over k) of the variances (over
hl
, . . . ,
h,)
of P(
hr
, . . . ,
h,;
k); and
we can consider the variance (over k) of the averages (over
hl,
. . . , h,,).
Evaluate these quantities.
31 An apple is located at vertex A of pentagon ABCDE, and a worm is
located two vertices away, at C. Every day the worm crawls with equal
probability to one of the two adjacent vertices. Thus after one day the
worm is at vertex B with probability
i
and at vertex D with probability
i.
Schrtidinger’s worm.
After two days, the worm might be back at C again, because it has no
memory of previous positions. When it reaches vertex A, it stops to dine.
a
What are the mean and variance of the number of days until dinner?
b Let p be the probability that the number of days is 100 or more.
What does Chebyshev’s inequality say about p?
C
What do the tail inequalities (exercise 12) tell us about p?
32 Alice and Bill are in
t:he
military, stationed in one of the five states
Kansas, Nebraska, Missouri, Oklahoma, or Colorado. Initially Alice is in
Nebraska and Bill is in Oklahoma. Every month each person is reassigned
to an adjacent state, each adjacent state being equally likely. (Here’s a
diagram of the adjacencies:
The initial states are circled.) For example, Alice is restationed after the
Definitely a finite-
first month to Colorado., Kansas, or Missouri, each with probability
l/3.
state
situation.
Find the mean and variance of the number of months it takes Alice and
Bill to find each other. (You may wish to enlist a computer’s help.)
33
34
(Use a calculator for
the numerical work
on this problem.)
8 EXERCISES 417
Are the random variables
X1
and
X2
in (8.88) independent?
Gina is a golfer who has probability p = .05 on each stroke of making a
“supershot” that gains a stroke over par, probability q =
.91
of making
an ordinary shot, and probability
T
= .04 of making a “subshot” that
costs her a stroke with respect to par. (Non-golfers: At each turn she
advances 2, 1, or 0 steps toward her goal, with probability
p,
q, or r,
respectively. On a par-m hole, her score is the minimum n such that she
has advanced m or more steps after taking n turns. A low score is better
than a high score.)
a
Show that Gina wins a par-4 hole more often than she loses, when
she plays against a player who shoots par. (In other words, the
probability that her score is less than 4 is greater than the probability
that her score is greater than 4.)
b
Show that her average score on a par-4 hole is greater than 4. (There-
fore she tends to lose against a “steady” player on total points, al-
though she would tend to win in match play by holes.)
Exam
problems
35 A die has been loaded with the probability distribution
WFJ)
= PI
;
Pr(m)
= ~2; . . . .
Pr(m)
= p6.
Let
S,
be the sum of the spots after this die has been rolled n times. Find
a necessary and sufficient condition on the “loading distribution” such
that the two random variables
S,
mod 2 and
S,
mod 3 are independent
of each other, for all n.
36 The six faces of a certain die contain the spot patterns
q pJUHpJH
instead of the usual
q
through
q
.
a
Show that there is a way to assign spots to the six faces of another
die so that, when these two dice are thrown, the sum of spots has the
same probability distribution as the sum of spots on two ordinary
dice. (Assume that all 36 face pairs are equally likely.)
b
Generalizing, find all ways to assign spots to the 6n faces of n dice so
that the distribution of spot sums will be the same as the distribution
of spot sums on n ordinary dice. (Each face should receive a positive
integer number of spots.)
37 Let p,, be the probability that exactly n tosses of a fair coin are needed
before heads are seen twice in a row, and let
qn
=
,&,
pk. Find closed
forms for both p,, and
qn
in terms of Fibonacci numbers.
418 DISCRETE PROBABILITY
38
What is the probability generating function for the number of times you
need to roll a fair die until all six faces have turned up? Generalize to
m-sided fair dice: Give closed forms for the mean and variance of the
number of rolls needed to see
1
of the m faces. What is the probability
that this number will be exactly n?
39
A Dirichlet probability generating function has the form
P(z) =
t
$.
lI>l
Thus P(0) = 1. If X is a random variable with Pr(X=n) =
pn,
express
E(X), V(X), and E(lnX) in terms of P(z) and its derivatives.
40
The mth cumulant
K,
of the binomial distribution (8.57) has the form
nfm(p), where
f,
is a polynomial of degree m. (For example, fl (p) = p
and fz(p) = p
-
p2,
because the mean and variance are np and npq.)
a
Find a closed form for the coefficient of
pk
in f,,,(p).
b Prove that
f,(i)
=:
(2”
-
l)B,/m+
[m=ll,
where
B,
is the mth
Bernoulli number.
41
Let the random variable X, be the number of flips of a fair coin until heads
have turned up a total of n times. Show that
E(X;:,)
=
(-l)n(ln2+
Hjnlz,
-
H,). Use the rnethods of Chapter 9 to estimate this value with
an absolute error of 0 (
?tp3
).
42
A certain man has a problem finding work. If he is unemployed on
any given morning, there’s constant probability
ph
(independent of past
history) that he will be hired before that evening; but if he’s got a job
when the day begins, there’s constant probability pf that he’ll be laid
Does
7)$
choose
off by nightfall. Find the average number of evenings on which he will
optima’line
breaks?
have a job lined up, assuming that he is initially employed and that this
process goes on for n days. (For example, if n = 1 the answer is 1
-pi.)
43
44
Find a closed form for the pgf G,(z) =
tk3c
pk,nzk, where pk,n is the
probability that a random permutation of n objects has exactly k cycles.
What are the mean and standard deviation of the number of cycles?
The athletic department runs an intramural “knockout tournament” for
2” tennis players as follows. In the first round, the players are paired off
randomly, with each pairing equally likely, and
2nm
matches are played.
The winners advance to the second round, where the same process pro-
duces 2”
winners. And so on; the kth round has
2npk
randomly chosen
matches between the
2”-mkf’
players who are still undefeated. The nth
round produces the champion. Unbeknownst to the tournament organiz-
ers, there is actually an (ordering among the players, so that
x1
is best, x2
A peculiar set of
tennis players.
‘A
fast arithmetic
computation shows
that the sherry is
always at least three
years old. Taking
computation further
gives the vertigo.”
-Revue du vin de
France
(Nov
1984)
8 EXERCISES 419
is second best, . . .
, x2” is worst. When
Xj
plays xk and j < k, the winner
is
xj
with probability p and
xk
with probability 1
-
p,
independent of
the other matches. We assume that the same probability p applies to all
j and k.
a
What’s the probability that
x1
wins the tournament?
b What’s the probability that the nth round (the final match) is be-
tween the top two players,
x1
and x2?
C What’s the probability that the best
2k
players are the competitors
in the kth-to-last round? (The previous questions were the cases
k=O and k= 1.)
d Let N(n) be the number of essentially different tournament results;
two tournaments are essentially the same if the matches take place
between the same players and have the same winners. Prove that
N(n) = 2”!.
e
What’s the probability that x2 wins the tournament?
f
Prove that if
i
< p < 1, the probability that xj wins is strictly
greater than the probability that xj+l wins, for 1 6 j < 2”.
45 True sherry is made in Spain according to a multistage system called
“Solera!’ For simplicity we’ll assume that the winemaker has only three
barrels, called A, B, and C. Every year a third of the wine from barrel C
is bottled and replaced by wine from B; then B is topped off with a third
of the wine from A; finally A is topped off with new wine. Let A(z), B(z),
C(z) be probability generating functions, where the coefficient of
Z”
is
the fraction of n-year-old wine in the corresponding barrel just after the
transfers have been made.
a Assume that the operation has been going on since time immemorial,
so that we have a steady state in which A(z), B(z), and C(z) are the
same at the beginning of each year. Find closed forms for these
generating functions.
b Find the mean and standard deviation of the age of the wine in each
barrel, under the same assumptions. What is the average age of the
sherry when it is bottled? How much of it is exactly 25 years old?
C
Now take the finiteness of time into account: Suppose that all three
barrels contained new wine at the beginning of year 0. What is the
average age of the sherry that is bottled at the beginning of year n?
46 Stefan Banach used to carry two boxes of matches, each containing
n matches initially. Whenever he needed a light he chose a box at ran-
dom, each with probability
i,
independent of his previous choices. After
taking out a match he’d put the box back in its pocket (even if the box
became empty-all famous mathematicians used to do this). When his
chosen box was empty he’d throw it away and reach for the other box.
420 DISCRETE PROBABIL1T.Y
a Once he found that the other box was empty too. What’s the prob-
ability that this occurs? (For n = 1 it happens half the time and
for n = 2 it happens
3/B
of the time.) To answer this part, find a
closed form for the generating function P(w, z) =
t,,,
pm,nwmzn,
where pm,,,
is the probability that, starting with m matches in one
box and n in the other, both boxes are empty when an empty box
is first chosen. Then. find a closed form for P~,~.
b Generalizing your
a.nswer
to part (a), find a closed form for the
probability that exactly k matches are in the other box when an
empty one is first th.rown away.
C
Find a closed form for the average number of matches in that other
And for the number
box.
in the empty box.
4’7
Some physicians, collaborating with some physicists, recently discovered
a pair of microbes that reproduce in a peculiar way. The male microbe,
called a diphage, has two receptors on its surface; the female microbe,
called a triphage, has three:
diphage:
3
triphage:
9
receptor: 0
When a culture of diphages and triphages is irradiated with a psi-particle,
exactly one of the receptors on one of the phages absorbs the particle;
each receptor is equally likely. If it was a diphage receptor, that diphage
changes to a triphage; if it was a triphage receptor, that triphage splits
into two diphages. Thus if an experiment starts with one diphage, the
first psi-particle changes it to a triphage, the second particle splits the
triphage into two diphages, and the third particle changes one of the
diphages to a triphage. The fourth particle hits either the diphage or
the triphage; then there are either two triphages (probability
g)
or three
diphages (probability
i).
Find a closed form for the average number
of diphages present, if we begin with a single diphage and irradiate the
culture n times with single psi-particles.
48 Five people stand at the vertices of a pentagon, throwing frisbees to each
Or, if this pentagon
other.
is in Arlington,
0
throwing missiles
at each other.
\
f
--
8 EXERCISES 421
Frisbee is a trade-
mark of Wham-O
Manufacturing
Company.
They have two frisbees, initially at adjacent vertices as shown. In each
time interval, each frisbee is thrown either to the left or to the right
(along an edge of the pentagon) with equal probability. This process
continues until one person is the target of two frisbees simultaneously;
then the game stops. (All throws are independent of past history.)
a
Find the mean and variance of the number of pairs of throws.
b Find a closed form for the probability that the game lasts more than
100 steps, in terms of Fibonacci numbers.
49 Luke Snowwalker spends winter vacations at his mountain cabin. The
front porch has m pairs of boots and the back porch has n pairs. Every
time he goes for a walk he flips a (fair) coin to decide whether to leave
from the front porch or the back, and he puts on a pair of boots at that
porch and heads off. There’s a 50/50 chance that he returns to each
porch, independent of his starting point, and he leaves the boots at the
porch he returns to. Thus after one walk there will be m + [-1 , 0, or
+l]
pairs on the front porch and n
-
[+l,
0, or -11 pairs on the back porch.
If all the boots pile up on one porch and if he decides to leave from
the other, he goes without boots and gets frostbite, ending his vacation.
Assuming that he continues his walks until the bitter end, let
PN
(m,
n) be
the probability that he completes exactly N nonfrostbitten trips, starting
with m pairs on the front porch and n on the back. Thus, if both m
and n are positive,
PN(m,n) =
+PN-r(m-
l,n+l)
+
tPi+,(m,n)
this follows because this first trip is either front/back, front/front, back/
back, or back/front, each with probability
i,
and N
-
1 trips remain.
a Complete the recurrence for
PN
(m, n) by finding formulas that hold
when m = 0 or n = 0. Use the recurrence to obtain equations that
hold among the probability generating functions
gm,n(z) =
x
PN(myn)zN
.
N>O
b Differentiate your equations and set
z
= 1, thereby obtaining rela-
tions among the quantities
g&(
1). Solve these equations, thereby
determining the mean number of trips before frostbite.
C Show that
gm,n
has a closed form if we substitute
z
= 1
/cos2
0:
1
gm,n
-
=
(
>
COG
8
sin(2m +
1)O
+ sin(2n + 1 )e
cos
8
sin(2m + 2n +
218
422 DISCRETE PROBABILITY
50 Consider the function
H(z) =
1
+
3+&-2)(9-z)).
The purpose of this problem is to prove that H(z) = tkZO
hkzk
is a
probability generating function, and to obtain some basic facts about it.
a
Let (1 -z)~/~(~-z)‘/~ =
t
k>O
ckzk.
Prove that
Co
= 3,
Cl
= -14/3,
c2
=
37/27,
and
c3+~
= 3 x,‘(k)
(l$)
($)
k+3
for all
1
3 0. Hint: Use
the identity
(9%z)"2 =
3(1
-2)"2(1
+
$z/(l
-2))"2
and expand the last factor in powers of z/( 1
-
z).
b Use part (a) and exercise 5.81 to show that the coefficients of H(z)
are all positive.
c
Prove the amazing identity
/=++2.
d What are the mean and variance of H?
51 The state lottery in El
Dorado
uses the payoff distribution H defined
in the previous problem. Each lottery ticket costs 1 doubloon, and the
payoff is k doubloons with probability
hk.
Your chance of winning with
each ticket is completely independent of your chance with other tickets;
in other words, winning or losing with one ticket does not affect your
probability of winning with any other ticket you might have purchased
in the same lottery.
a
Suppose you start with one doubloon and play this game. If you win
k doubloons, you buy k tickets in the second game; then you take
the total winnings in the second game and apply all of them to the
third; and so on.
1:f
none of your tickets is a winner, you’re broke
and you have to stop gambling. Prove that the pgf of your current
holdings after n rounds of such play is
4
‘-
dm14z)+2n-1
+
J(S-z)/(l
-z)+2n+l
b Let
gn
be the probability that you lose all your money for the first
time on the nth game, and let
G(z)
=
glz
+
g2z2
+ ... . Prove
that G(1) = 1. (This means that you’re bound to lose sooner or
later, with probability
1,
although you might have fun playing in
the meantime.) What are the mean and the variance of G?
8 EXERCISES 423
C
What is the average total number of tickets you buy, if you continue
to play until going broke?
d What is the average number of games until you lose everything if
A doubledoubloon.
you start with two doubloons instead of just one?
Bonus problems
52
53
54
55
56
57
Show that the text’s definitions of median and mode for random variables
correspond in some meaningful sense to the definitions of median and
mode for sequences, when the probability space is finite.
Prove or disprove: If X, Y, and Z are random variables with the property
that all three pairs (X, Y), (X, Z) and (Y, Z) are independent, then X + Y
is independent of Z.
Equation (8.20) proves that the average value of \iX is VX. What is the
variance of
VX?
A normal deck of playing cards contains 52 cards, four each with face
values in the set
{A,2,3,4,5,6,7,8,9,1O,J,Q,K}.
Let X and Y denote
the respective face values of the top and bottom cards, and consider the
following algorithm for shuffling:
Sl
Permute the deck randomly so that each arrangement occurs with
probability l/52!.
S2
If X # Y, flip a biased coin that comes up heads with probability p,
and go back to step Sl if heads turns up. Otherwise stop.
Each coin flip and each permutation is assumed to be independent of all
the other randomizations. What value of p will make X and Y indepen-
dent random variables after this procedure stops?
Generalize the frisbee problem of exercise 48 from a pentagon to an
n-gon. What are the mean and variance of the number of collision-free
throws in general, when the frisbees are initially at adjacent vertices?
Show that, if m is odd, the pgf for the number of throws can be written
as a product of coin-flipping distributions:
(m-1
l/2
Pk=
G,(z) =
n
~
k=,
1
-qkz’
where
pk
= sin
2
(2k-
1)~
2m
,
qk
=
cos2
‘2k2-; In.
Hint: Try the substitution z =
l/cos2
0.
Prove that the Penney-ante pattern
‘~1~2
. . .
~~~1~1
is always inferior to
the pattern jszrlr2 . . .
‘cl-1
when a fair coin is flipped, if
1
3 3.
424 DISCRETE PROBABILITY
58 Are there patterns A and B of heads and tails such that A is longer
than B, yet A appears before B more than half the time when a fair coin
is being flipped?
59
Let k and n be fixed positive integers with k < n.
a
Find a closed form for the probability generating function
G(w,z) =
$
f
...
f
WP/hl
,....h,;klZP(hl
,....h,,;nl
h,
=l
h,=l
for the joint distribution of the numbers of probes needed to find the
kth and nth items that have been inserted into a hash table with
m lists.
b
Although the random variables P(h1,. . . ,
h,;
k) and P(h1,. . . , h,,; n)
are dependent, show that they are somewhat independent:
E(P(h,
, . . . ,
h,;
k)Ph,
. . . ,
h;
n))
= (EP(h+...
,
h,;
k))
(Whl,
. . . , hn;
n))
.
60 Use the result of the previous exercise to prove (8.103).
61
Continuing exercise 47, find the variance of the number of diphages after
n irradiations.
Research
problems
62 The normal distribution is a non-discrete probability distribution char-
acterized by having all its cumulants zero except the mean and the vari-
ance. Is there an easy way to tell if a given sequence of cumulants
(Kl,Kz,K3,...)
comes from a discrete distribution? (All the probabil-
ities must be “atomic” in a discrete distribution.)
63 Is there any sequence A =
‘~1~2
. . . ri-1ri of
1
3 3 heads and tails such
that the sequences
Hr1
r.2
. . .
~1-1
and
Trlrr
. . .
‘cl-1
both perform equally
well against A in the game of Penney ante?
9
Asymptotics
EXACT ANSWERS are great when we can find them; there’s something
very satisfying about complete knowledge. But there’s also a time when
approximations are in order. If we run into a sum or a recurrence whose
solution doesn’t have a closed form (as far as we can tell), we still would like
to know something about the answer; we don’t have to insist on all or nothing.
And even if we do have a closed form, our knowledge might be imperfect, since
we might not know how to compare it with other closed forms.
For example, there is (apparently) no closed form for the sum
But it is nice to know that
Uh
oh . . here we say that the sum is “asymptotic to” 2(3,“). It’s even nicer to have more
comes that A-word.
detailed information, like
s,
=
(3(2-;+0($)).
(9.1)
which gives us a “relative error of order 1 /n’.” But even this isn’t enough to
tell us how big
S,
is, compared with other quantities. Which is larger,
S,
or
the Fibonacci number
Fan?
Answer: We have
S2
= 22 >
Fs
= 21 when n = 2;
but
Fan
is eventually larger, because
F4,,
N
$4n/&
and +4
z
6.8541, while
S,
=
/36.751”(1
-
g
+
O($))
.
(9.2)
Our goal in this chapter is to learn how to understand and to derive results
like this without great pain.
425
426 ASYMPTOTICS
The word asymptotic stems from a Greek root meaning “not falling
Other words like
together!’ When ancient Greek mathematicians studied conic sections, they
‘symptom’
and
considered hyperbolas like the graph of y =
dm,
‘ptomaine’
also
come from this root.
which has the lines y =
x
and y = --x as “asymptotes!’ The curve approaches
but never quite touches these asymptotes, when x
+
00. Nowadays we use
“asymptotic” in a broader sense to mean any approximate value that gets
closer and closer to the truth, when some parameter approaches a limiting
value. For us, asymptotics means “almost falling together!’
Some asymptotic formulas are very difficult to derive, well beyond the
scope of this book. We will content ourselves with an introduction to the sub-
ject; we hope to acquire a suitable foundation on which further techniques can
be built. We will be particularly interested in understanding the definitions
of
‘m’
and ‘0’ and similar symbols, and we’ll study basic ways to manipulate
asymptotic quantities.
9.1 A HIERARCHY
Functions of n that occur in practice usually have different “asymp-
totic growth ratios”; one of them will approach infinity faster than another.
We formalize this by saying that
f(n)
+
g(n)
H
f(n)
o
&z&s(n)
=
.
(9.3)
This relation is transitive: If f(n) 4 g(n) and g(n) 4 h(n) then f(n) 4 h(n).
We also may write g(n) + f(n) if f(n) + g(n) . This notation was introduced
A/l
functions
in 1871 by Paul du Bois-Re:ymond
[29].
great and small.
For example, n 4
n’;
informally we say that n grows more slowly
than
n*.
In fact,
(9.4)
when a and
fi
are arbitrary real numbers.
There are, of course, many functions of n besides powers of n. We can
use the + relation to rank lots of functions into an asymptotic pecking order
A loerarchy?
9.1 A HIERARCHY 427
that includes entries like this:
1
-x
log
logn 4 logn
+ n’ +
nc
4
nlogn
4
cn
4 nn 4
ccn
(Here
c
and c are arbitrary constants with
0
<
E
< 1 < c.)
All functions listed here, except
1,
go to infinity as n goes to infinity.
Thus when we try to place a new function in this hierarchy, we’re not trying
to determine whether it becomes infinite but rather how fast.
It helps to cultivate an expansive attitude when we’re doing asymptotic
analysis: We should THINK BIG, when imagining a variable that approaches
infinity. For example, the hierarchy says that logn +
n”.ooo’;
this might
seem wrong if we limit our horizons to teeny-tiny numbers like one googol,
n =
10’O”.
For in that case, logn = 100, while
no.ooo’
is only loo.”
z
1.0233.
But if we go up to a googolplex, n = 1 O”“‘, then logn = 10”’ pales in
comparison with
no.ooo’
=
10’Oq6.
Even if
e
is extremely small (smaller than, say,
l/lO'"'oo),
the value
of logn will be much smaller than the value of
n’,
if n is large enough. For
if we set n =
10102k,
where k is so large that
e
3
10pk,
we have logn =
10Zk
but n’ 3 1
O'@.
The ratio (logn)/n” therefore approaches zero as n
+
co.
The hierarchy shown above deals with functions that go to infinity. Often,
however, we’re interested in functions that go to zero, so it’s useful to have
a similar hierarchy for those functions. We get one by taking reciprocals,
because when f(n) and g(n) are never zero we have
1 1
f(n) 4
g(n)
H
-
-
g(n)
+ f(n)
(9.5)
Thus, for example, the following functions (except 1) all go to zero:
7
1
-+&+-
nlog
n
.-&-&+
1 1
-+-
log
n
log log
4
1.
n
Let’s look at a few other functions to see where they fit in. The number
rr(n) of primes less than or equal to n is known to be approximately
n/inn.
Since 1 /ne + 1
/Inn
4
1,
multiplying by n tells us that
n”’
+
7r(n)
+ n.
We can in fact generalize (9.4) by noticing, for example, that
na’
(logn)az(loglogn)a3
4 nB’(logn)PZ(loglogn)83
w
(xl,a2,a3)
<
(b1,62,p3).
(9.6)
Here ‘(LX’,
012,013)
< (p’,
(32,
(33)’
means lexicographic order (dictionary or-
der); in other words, either a’ < p’, or
0~’
=
(3’
and
CX~
<
f12,
or
a’
=
(3’
and
a2
=
62
and
0~3
-C
83.
428
ASYMPTOTIC3
How about the functio:n
efi;
where does it live in the hierarchy? We
can answer questions like this by using the rule
which follows in two steps from definition (9.3) by taking logarithms. Conse-
quently
1
+ f(n) 4 s(n)
==+
eiflnll
+
e191”ll
.
And since 1 4 log logn 4
\/logn
4
c
logn, we have logn +
ee
+ n6.
When two functions f(n) and g(n) have the same rate of growth, we
write ‘f(n) x g(n)‘. The ofhcial definition is:
f(n)
=:
s(n)
W
f(n)1
<
Clg(n)l
and
Is(n)1
6
Clf(nll,
for some C and for all sufficiently large n.
(9.8)
This holds, for example, if f(n) is constant and g(n) = cos n + arctan
n.
We
will prove shortly that it
h.olds
whenever f(n) and g(n) are polynomials of
the same degree. There’s
al.so
a stronger relation, defined by the rule
In this case we say that “f(n) is asymptotic to g(n)!’
G. H. Hardy
[148]
introduced an interesting and important concept called
the class of logarithmico-exponential functions, defined recursively as the
smallest family
C
of functions satisfying the following properties:
.
The constant function f(n) =
01
is in
C,
for all real
01.
.
The identity function f(n) = n is in
C.
.
If f(n) and g(n) are in
2,
so is f(n)
-
g(n).
.
If f(n) is in
2,
so is
efcni.
.
If f(n) is in
C
and is “eventually positive,” then lnf(n) is in
C.
A function f(n) is called “eventually positive” if there is an integer
no
such
that f(n) > 0 whenever n
2:
no.
We can use these rules to show, for example, that f(n) + g(n) is in
C
whenever f(n) and g(n) are, because f(n) + g(n) = f(n)
-
(O-g(n)). If f(n)
and g(n) are eventually positive members of
C,
their product f(n) g(n) =
elnf(n)+lnsin) and quotient f(n)/g(n) =
elnf(nlm
lnsini
are in
C;
so are func-
tions like
m
=
eilnf(nl,
etc. Hardy proved that every logarithmico-
exponential function is eventually positive, eventually negative, or identically
zero. Therefore the product and quotient of any two C-functions is in
2,
except that we cannot divide by a function that’s identically zero.
9.1 A HIERARCHY 429
8,
.
.
.
wir
durch das
Zeichen 0 (n) eine
GrliSe
ausdrijcken,
deren
Ordnung in
Bezug
auf n
die
Ordnung
von
n
nicht
iiberschreitet;
ob
sic
wirklich
GIieder
von der
Ordnung n
in sich
enthhlt,
bleibt
bei
dem
bisherigen
SchluDverfahren
dahingestellt.”
-
t?
Bachmann
[14]
Hardy’s main theorem about logarithmico-exponential functions is that
they form an asymptotic hierarchy: If f(n) and g(n) are any functions in
C,
then either f(n) + g(n), or f(n) + g(n), or f(n) x g(n). In the last case
there is, in fact, a constant a such that
f(n)
-
as(n).
The proof of Hardy’s theorem is beyond the scope of this book; but it’s nice
to know that the theorem exists, because almost every function we ever need
to deal with is in
2.
In practice, we can generally fit a given function into a
given hierarchy without great difficulty.
9.2
0 NOTATION
A wonderful notational convention for asymptotic analysis was in-
troduced by Paul Bachmann in 1894 and popularized in subsequent years by
Edmund Landau and others. We have seen it in formulas like
H,
=
lnn+y+O(l/n),
(9.10)
which tells us that the nth harmonic number is equal to the natural logarithm
of n plus Euler’s constant, plus a quantity that is “Big Oh of 1 over n!’ This
last quantity isn’t specified exactly; but whatever it is, the notation claims
that its absolute value is no more than a constant times l/n.
The beauty of O-notation is that it suppresses unimportant detail and
lets us concentrate on salient features: The quantity
O(1
/n) is negligibly
small, if constant multiples of l/n are unimportant.
Furthermore we get to use 0 right in the middle of a formula. If we want
to express
(9.10)
in terms of the notations in Section 9.1, we must transpose
‘Inn +
y’
to the left side and specify a weaker result like
log log n
H,-Inn-y
-X
n
or a stronger result like
H,-Inn-y
x
i.
The Big Oh notation allows us to specify an appropriate amount of detail
in place, without transposition.
The idea of imprecisely specified quantities can be made clearer if we
consider some additional examples. We occasionally use the notation
fl
to
stand for something that is either
+1
or -1; we don’t know (or perhaps we
don’t care) which it is, yet we can manipulate it in formulas.
430 ASYMPTOTICS
N. G. de Bruijn begins his book Asymptotic Methods in Analysis by
considering a Big
El1
notation that helps us understand Big Oh. If we write
L(5) for a number whose absolute value is less than 5 (but we don’t say what
the number is), then we ca:n perform certain calculations without knowing
the full truth. For example, we can deduce formulas such as 1 + L(5) = L(6);
L(2) + L(3) = L(5); L(2)L(3) = L(6);
eLc5)
= L(e5); and so on. But we cannot
conclude that L(5)
-
L(3) = L(2), since the left side might be 4
-
0. In fact,
the most we can say is L(5)
-
L(3) = L(8).
Bachmann’s O-notation is similar to L-notation but it’s even less precise:
0
(01)
stands for a number whose absolute value is at most a constant times
1011.
We don’t say what the number is and we don’t even say what the constant is.
Of course the notion of a
“c:onstant”
is nonsense if there is nothing variable
in the picture, so we use O-notation only in contexts when there’s at least
one quantity (say n) whose value is varying. The formula
f(n) = O(g(n))
for all n
means in this context that there is a constant C such that
(9.11)
If(n)1 6 Clg(n)( for all n;
(9.12)
and when O(g(n)) stands in the middle of a formula it represents a function
f(n) that satisfies (9.12). The values of f(n) are unknown, but we do know
that they aren’t too large. Similarly, de Bruijn’s ‘L(n)’ represents an un-
specified function f(n) whose values satisfy If(n) ( <
In/.
The main difference
between L and 0 is that O-notation involves an unspecified constant C; each
appearance of 0 might involve a different C, but each C is independent of n.
For example, we know that the sum of the first n squares is
0,
=
$(n+t)(n+l)
=
in3+tn2+in.
We can write
0,
= O(n3)
because
iin
+ in2 + inI 6 $n13 + SInI + tinI 6 $n31 +
$r31+
iIn
=
In31
for all integers n. Similarly, we have the more specific formula
On = in3
+O(n2);
we can also be sloppy and throw away information, saying that
III, = O(n’O).
It’s not nonsense,
but it is pointless.
I’ve got a little
list --I’ve got a
little
Iist,
Of annoying terms
and details that
might well be under
ground,
And that never
would be missed
-
that never would be
missed.
Nothing in the definition of 0 requires us to give a best possible bound.
9.2 0 NOTATION 431
You are the fairest
of your sex,
Let me be your
hero;
I love you as
one over x,
As x approaches
zero.
Positively.
But wait a minute. What if the variable n isn’t an integer? What if we
have a formula like S(x) = $x3 +
ix2
+
ix,
where x is a real number? Then we
cannot say that S(x) =
0(x3),
because the ratio
S(x)/x3
=
3
+
ix-’
+
ix
2
becomes unbounded when x
+
0. And we cannot say that S(x) = O(x),
because the ratio S(x)/x = $x2 +
ix
+
i
becomes unbounded when x
t
00.
So we apparently can’t use O-notation with S(x).
The answer to this dilemma is that variables used with 0 are generally
subject to side conditions. For example, if we stipulate that
1x1
3 1, or that
x 3
c
where
E
is any positive constant, or that x is an integer, then we can
write S(x) =
0(x3).
If we stipulate that
1x1
6 1, or that
1x1
6 c where c is
any positive constant, then we can write S(x) = O(x). The O-notation is
governed by its environment, by constraints on the variables involved.
These constraints are often specified by a limiting relation. For example,
we might say that
f(n)
=
O(s(n))
as
n-3
03.
(9.13)
This means that the O-condition is supposed to hold when n is “near”
co;
we don’t care what happens unless n is quite large. Moreover, we don’t
even specify exactly what “near” means; in such cases each appearance of 0
implicitly asserts the existence of two constants C and no, such that
(f(n)1 6 Clg(n)l whenever n >
no.
(9.14)
The values of C and
no
might be different for each 0, but they do not depend
on n. Similarly, the notation
f(x)
=
qdxl)
asx+O
means that there exist two constants C and
c
such that
(f(x)/ 6 Clg(xl(
whenever
1x1
6 E.
(9.15)
The limiting value does not have to be
co
or 0; we can write
lnz =
z-l+O((z-1)2)
as
2
-3 1,
because it can be proved that
Ilnz-z+l/
6
/z-
112
when
lz-
11
6
5.
Our definition of 0 has gradually developed, over a few pages, from some-
thing that seemed pretty obvious to something that seems rather complex; we
now have 0 representing an undefined function and either one or two unspec-
ified constants, depending on the environment. This may seem complicated
enough for any reasonable notation, but it’s still not the whole story! Another
432 ASYMPTOTICS
subtle consideration lurks in the background. Namely, we need to realize that
it’s fine to write
in3 + in2 +
An
=
O(n3),
but we should
neueT
write this equality with the sides reversed. Otherwise
we could deduce ridiculous things like n =
n2
from the identities n = 0 (n2)
and
n2
= O(n2). When we work with O-notation and any other formulas
that involve imprecisely specified quantities, we are dealing with one-way
equalities. The right side of an equation does not give more information than
the left side, and it may give less; the right is a “crudification” of the left.
From a strictly formal point of view, the notation 0( g(n)) does not
stand for a single function -f(n), but for the set of all functions f(n) such
that If(n)1 6 Clg(n)l f
or some constant C. An ordinary formula g(n) that
doesn’t involve O-notation stands for the set containing a single function
f(n) = g(n). If S and T are sets of functions of n, the notation S + T stands
for the set of all functions of the form f(n) + g(n), where f(n)
E
S and
g(n)
E
T; other notations like S-T, ST, S/T,
&,
es,
In S are defined similarly.
Then an “equation” between such sets of functions is, strictly speaking, a set
inclusion; the ‘=’ sign reall:y means ‘g’. These formal definitions put all of
our 0 manipulations on firm logical ground.
For example, the “equation”
in3 + O(n’) = O(n3)
means that
S1
&
S2,
where
S-I
is the set of all functions of the form
in3+f1
(n)
such that there exists a constant Cl with
If,(n)/
6 Clln’I, and where
S2
is the set of all functions f.!(n) such that there exists a constant CJ with
Ifz(n)l
6 C21n31.
w
e can formally prove this “equation” by taking an arbi-
trary element of the left-hand side and showing that it belongs to the right-
hand side: Given in3 + fl In) such that If,(n)1 <
C11n21,
we must prove
that there’s a constant Cl such that l$n3 + fl (n)l 6
C21n31.
The constant
Cl =
3
+ Cl does the trick,
Isince
n2
6
In31
for all integers n.
If ‘=’ really means ‘C’, why don’t we use ‘c’ instead of abusing the equals
sign? There are four reasons.
First, tradition. Number theorists started using the equals sign with O-
notation and the practice stuck. It’s sufficiently well established by now that
we cannot hope to get the mathematical community to change.
Second, tradition. Computer people are quite used to seeing equals signs
abused- for years FORTRAN and BASIC programmers have been writing
assignment statements like
“N
= N +
1’.
One more abuse isn’t much.
Third, tradition. We often read ‘=’ as the word ‘is’. For instance we
verbalize the formula
H,
= O(log n) by saying “H sub n is Big Oh of log n!’
“And to auoide the
tediouse repetition
of these woordes:
is equal/e to:
I
will
sette as
I
doe often
in woorke use, a
paire of paralleles,
or
Gemowe
lines of
one lengthe, thus:
=
,
bicause
noe
.2.
thynges, can
be moare equal/e.”
-R.
Recorde
12461
“It
is
obvious that
the sign = is really
the wrong sign
for such relations,
because it suggests
symmetry, and
there is no such
symmetry. . . .
Once this
warning
has been given,
there is, however,
not
much harm
in
using the sign = ,
and we
shall
main-
tain it,
for no
other
reason than that it
is customary.”
-N. G. de Bruijn
/62]
[Now is a good
time to do
warmup
exercises 3
and
4.)
9.2 0 NOTATION 433
And in English, this ‘is’ is one-way. We say that a bird is an animal, but we
don’t say that an animal is a bird;
“animal” is a crudification of “bird!’
Fourth, for our purposes it’s natural. If we limited our use of O-notation
to situations where it occupies the whole right side of a formula-as in the
harmonic number approximation
H,
= O(log n), or as in the description of
a sorting algorithm’s running time T(n) = O(nlogn) -it wouldn’t matter
whether we used ‘=’ or something else. But when we use O-notation in the
middle of an expression, as we usually do in asymptotic calculations, our
intuition is well satisfied if we think of the equals sign as an equality, and if
we think of something like 0 (1 /n) as a very small quantity.
So we’ll continue to use ‘=I, and we’ll continue to regard O(g(n)) as an
incompletely specified function, knowing that we can always fall back on the
set-theoretic definition if we must.
But we ought to mention one more technicality while we’re picking nits
about definitions: If there are several variables in the environment, O-notation
formally represents sets of functions of two or more variables, not just one.
The domain of each function is every variable that is currently “free” to vary.
This concept can be a bit subtle, because a variable might be defined only
in parts of an expression, when it’s controlled by a
x
or something similar.
For example, let’s look closely at the equation
f
(k’ + O(k)) =
in3
+ O(n2), integer n > 0.
(9.16)
k=O
The expression
k2
+ O(k) on the left stands for the set of all two-variable
functions of the form
k2
+ f(k,n) such that there exists a constant C with
]f(k, n)l $ Ck for 0 < k 6 n. The sum of this set of functions, for 0 6 k < n,
is the set of all functions g(n) of the form
$(k’+f(k,n))
=
~n3+~n2+~n+f(0,n)+f(l,n)+...+f(n,n),
k=O
where f has the stated property. Since we have
(~n2+~n+f(0,n)+f(l,n)+...+f(n,n)l
<
~n2+/7n2+C.0+C.l
+...+C.n
< n2 + Cl,n2 + n)/2 < (C + l)n2 ,
all such functions g(n) belong to the right-hand side of (9.16); therefore (9.16)
is true.
People sometimes abuse O-notation by assuming that it gives an exact
order of growth; they use it as if it specifies a lower bound as well as an
upper bound. For ex.ample, an algorithm to sort n numbers might be called
434 ASYMPTOTICS
inefficient “because its running time is O(n2)!’ But a running time of 0 (nL)
does not imply that the rrrnning time is not also O(n). There’s another
notation, Big Omega, for lower bounds:
f(n) =
fl(gin))
W
If(n)1 3 Clg(nil
for some C > 0. (9.17)
We have f(n) =
fI(g(n))
if and only if g(n) = O(f(n)). A sorting algorithm
whose running time is n( n’
)
is inefficient compared with one whose running
time is 0 (n log n) , if n is large enough.
Finally there’s Big Theta, which specifies an exact order of growth:
Since
0
and
0
are
umercase
Greek
f(n)
=
O(g(n))
‘In)
=
o(g(n))
w
and
f(n)
=
n(g(n))
.
It%ers,
the 0 in
(9.18) O-notation must
be a capital Greek
We have f(n) = @(g(n)) if
<and
only if f(n) x g(n) in the notation we saw
Omicrdn.
After
a//,
Greeks
in-
previously, equation (9.8).
vented asymptotics.
Edmund Landau
[194]
invented a “little oh” notation,
f(n)
=
o(g(n))
W
(f(n)1
<
lElg(n
,)I
for all n 3 no(e) and
for all constants
e
> 0.
(9.19)
This is essentially the relation f(n
.)
+ g(n) of (9.8). We also have
f(n)
-
4(n)
W
f(n)
=
s(n)
+0(64(n)).
(9.20)
Many authors use
‘0’
in asymptotic formulas, but a more explicit ‘0’ ex-
pression is almost always preferable. For example, the average running time
of a computer method called. “bubblesort” depends on the asymptotic value
of the sum P(n) = ,YF=, k
n
k
kl/n’. Elementary asymptotic methods suffice
. .
to prove that P(n)
N
m?!,
which means that the ratio
P(n)/a
ap-
proaches 1 as n
t
co. However, the true behavior of P(n) is best understood
by considering the
d@eerence,
P(n)
-
J7cn/2, not the ratio:
n
(
P(nllJ7m72
1
P(n)
-
&G
1
0.798
-0.253
10 0.878
-0.484
20
0.904
-0.538
30
0.918
-0.561
40
0.927 -0.575
50
0.934
-0.585
The numerical evidence in the middle column is not very compelling; it cer-
tainly is far from a dramatic proof that
P(n)/-
approaches 1 rapidly,
9.2 0 NOTATION 435
if at all. But the right-hand column shows that P(n) is very close indeed to
,/%$.
Thus we can characterize the behavior of P(n) much better if we can
derive formulas of the form
P(n)
=
&72+0(l),
or even sharper estimates like
P(n) =
$ZQ?-
$+0(1/&x)
Stronger methods of asymptotic analysis are needed to prove O-results, but
the additional effort required to learn these stronger methods is amply com-
pensated by the improved understanding that comes with O-bounds.
Moreover, many sorting algorithms have running times of the form
T(n) = Anlgn + Bn + O(logn)
Also ID, the
Dura-
Aame
logarithm.
Notice that
log log log n
is undefined when
n=2.
for some constants A and B. Analyses that stop at T(n)
N
Anlgn don’t tell
the whole story, and it turns out to be a bad strategy to choose a sorting algo-
rithm based just on its A value. Algorithms with a good ‘A’ often achieve this
at the expense of a bad ‘B’. Since nlgn grows only slightly faster than n, the
algorithm that’s faster asymptotically (the one with a slightly smaller A value)
might be faster only for values of n that never actually arise in practice. Thus,
asymptotic methods that allow us to go past the first term and evaluate B
are necessary if we are to make the right choice of method.
Before we go on to study 0, let’s talk about one more small aspect of
mathematical style. Three different notations for logarithms have been used
in this chapter: lg,
In,
and log. We often use ‘lg’ in connection with computer
methods, because binary logarithms are often relevant in such cases; and
we often use
‘In
in purely mathematical calculations, since the formulas for
natural logarithms are nice and simple. But what about ‘log’? Isn’t this
the “common” base-10 logarithm that students learn in high school-the
“common” logarithm that turns out to be very uncommon in mathematics
and computer science? Yes; and many mathematicians confuse the issue
by using ‘log’ to stand for natural logarithms or binary logarithms. There
is no universal agreement here.
But we can usually breathe a sigh of relief
when a logarithm appears inside O-notation, because 0 ignores multiplicative
constants. There is no difference between O(lgn), O(lnn), and O(logn), as
n
--+
00;
similarly, there is no difference between 0 (Ig lg n), 0
(In
In
n), and
O(loglog n). We get to choose whichever we please; and the one with ‘log’
seems friendlier because it is more pronounceable. Therefore we generally
use ‘log’ in all contexts where it improves readability without introducing
ambiguity.
436 ASYMPTOTICS
9.3
0 MANIPULATION
Like any mathematical formalism, the O-notation has rules of ma-
nipulation that free us from the grungy details of its definition. Once we
prove that the rules are correct, using the definition, we can henceforth work
on a higher plane and forget about actually verifying that one set of functions
is contained in another. We don’t even need to calculate the constants C that
The secret of beinn
are implied by each 0, as long as we follow rules that guarantee the existence
a bore is to tell
of such constants.
everything.
-
Voltaire
For example, we can prove once and for all that
nm
=
O(n”‘),
when m 6 m’;
O(f(n))
+0(9(n))
=
O(lf(n)l+
lg(n)l) .
(9.21)
(9.22)
Then we can sayimmediateby that
$n3+in2+in
=
O(n3)+O(n3)+O(n3)
=
O(n3), without the laborious calculations in the previous section.
Here are some more rules that follow easily from the definition:
f(n)
=
O(f(n))
;
c. O(f(n)) = O(f(n)) ,
if c is constant;
O(O(f(n)))
=
0(+(n))
;
O(f(n))O(g(n))
=
O(f(n)s(n))
;
O(f(n) s(n)) =
f(n)O(s(n))
.
(9.23)
(9.24)
(9.25)
(9.26)
(9.27)
Exercise 9 proves (g.22), and the proofs of the others are similar. We can
always replace something of the form on the left by what’s on the right,
regardless of the side conditions on the variable n.
Equations (9.27) and (9.23) allow us to derive the identity O(f(n)2) =
0 (f(n))
2.
This sometimes helps avoid parentheses, since we can write
O(logn)’ instead of
O((logn)2).
Both of these are preferable to
‘O(log2
n)‘, which is ambiguous because some
authors use it to mean ‘O(loglogn)‘.
Can we also write
0 (log n)
--
instead
Iof
O((logn))‘)
?
(Note: The formula
O(f(n))2
does not
denote the set of
all functions
g(n)’
where g(n) is in
O(f(n)); such
functions g(n)2
cannot be nega-
tive, but the set
O(f(n))’
includes
negative functions.
In genera/, when
S is a set, the no-
tation
S2
stands
for the set of all
No! This is an abuse of notation, since the set of functions l/O(logn) is
products
s’s2
with
neither a subset nor a superset of 0 (1 /log n). We could legitimately substitute
sl
and
s2
in S,
fI(logn)--’ for 0
((logn)-‘),
but this would be awkward. So we’ll restrict our
not for the set of
all
squares
Sz
w;th
use of “exponents outside the 0” to constant, positive integer exponents.
s
E
S.)
9.3 0 MANIPULATION 437
Power series give us some of the most useful operations of all. If the sum
S(z)
=
tanz”
n>O
converges absolutely for some complex number
z
=
a,
then
S(z)
=
O(l),
for all
121
6
/22/.
This is obvious, because
In particular,
S(z)
=:
O(1) as
z
+
0, and S(l/n) = O(1) as n
+
00,
provided
only that
S(z)
converges for at least one
nonzero
value of
z.
We can use this
principle to truncate a power series at any convenient point and estimate the
remainder with 0. For example, not only is
S(z)
= 0( 1
),
but
S(z)
= a0
+0(z),
S(z)
=
a0
+
al2
+
O(z2)
,
and so on, because
S(z) =
x
ukzk
+zm
x
a,znem
O$k<m
n>m
and the latter sum is 0 (1). Table 438 lists some of the most useful asymp-
totic formulas, half of which are simply based on truncation of power series
according to this rule.
Dirichlet series, which are sums of the form
tka,
ak/k’,
can be truncated
in a similar way: If a Dirichlet series converges absolutely when
z
=
a,
we
can truncate it at any term and get the approximation
t
ok/k’ + O(m-‘) ,
l<k<m
Remember that
R
stands for “‘real
part.”
valid for
!.Xz
>
9%~.
The asymptotic formula for Bernoulli numbers
B,
in
Table 438 illustrates this principle.
On the other hand, the asymptotic formulas for H,, n!, and
rr(n)
in
Table 438 are not truncations of convergent series; if we extended them in-
definitely they would diverge for all values of n. This is particularly easy to
see in the case of n(n), since we have already observed in Section 7.3, Ex-
ample 5, that the power series
tk30
k!/
(In
n)
k
is everywhere divergent. Yet
these truncations of divergent series turn out to be useful approximations.
138
ASYMPTOTICS
Table 438 Asymptotic approximations, valid as n
+
00
and
z
+
0.
5-
H,
=
lnn+y+&-A+&
(‘).
+O
2
(9.28)
?A!-
. (9.29)
B, = 2[n even](-1
)n,/2
&(l+2pn+3~n+O(4mn)).
(9.30)
-4
n(n) =
&
+
ilntj2
+
2!n
-+&$+o(&&
(9.31)
(Inni
ez
=
‘+r+;+~+~+o(r5i.
(9.32)
ln(l+z)
=
z-f+$-~+0(z5).
(9.33)
1
~
= 1
+z+z2+23+t4+0(25).
1-z
(9.34)
(1 +z)a = 1 +cxz+
(;)d+
(;)z3+
(;)24+o(z’l
(9.35)
An asymptotic approximation is said to have absolute error 0( g(n)) if
it has the form f(n)+O(g(n))
w h
ere f(n) doesn’t involve 0. The approxima-
tion has relative error O(g(n)) if it has the form
f(n)(l
+ O(g(n))) where
f(n) doesn’t involve 0. For example, the approximation for
H,
in Table 438
has absolute error
O(n
6);
the approximation for n! has relative error
O(n4).
(The right-hand side of (9.29) doesn’t actually have the required form f(n) x
(1 + O(n “)), but we could rewrite it
dGi
(f)n(l
+
&
+
&
-
‘)
(1 + O(nP4))
5 1
840n3
if we wanted to; a similar calculation is the subject of exercise 12.) The
(Relative error
absolute error of this approximation is O(n”
3.5e
~-“).
Absolute error is related
is nice for taking
to the number of correct decimal digits to the right of the decimal point if
reciprocals, because
,,(,
+ 0(c)) =
the 0 term is ignored; relative error corresponds to the number of correct 1
+0(E).)
“significant figures!’
We can use truncation of power series to prove the general laws
ln(l +
O(f(n)))
=
O(f(n))
,
if f(n) < 1;
(9.36)
e”‘f’n)l
= 1 +O(f(n)) ,
if f(n) = O(1).
(9.37)
9.3 0 MANIPULATION 439
(Here we assume that n
+
00;
similar formulas hold for ln( 1 + 0 (f(x)
))
and
e”(f(x)l
as x
-+
0.)
For
example, let
ln(1
+ g(n)) be any function belonging
to the left side of (9.36). Then there are constants C,
no,
and c such that
(g(n)/
6
CJf(n.)I
< c < 1 , for all n 3 no.
It follows that the infinite sum
ln(1
+ g(n)) =
g(n).
(1
-
is(n)
+
+9(n)‘-...)
converges for all n 3
no,
and the parenthesized series is bounded by the
constant 1 +
tc
+
+c2
+ . . . . This proves
(g.36),
and the proof of
(9.37)
is
similar. Equations (9.36) and
(g-37)
combine to give the useful formula
(1
+ O(f(n)))“(g(n)) =
1
+
O(f(n)g(n))
,
f~~‘,;l~~
:;tj.
(9.38)
Problem 1: Return to the Wheel of Fortune.
Let’s try our luck now at a few asymptotic problems. In Chapter 3 we
derived equation (3.13) for the number of winning positions in a certain game:
W =
LN/KJ+;K2+$K-3,
K=[mj.
And we promised that an asymptotic version of W would be derived in Chap-
ter 9. Well, here we are in Chapter 9; let’s try to estimate W, as N
+
03.
The main idea here is to remove the floor brackets, replacing K by
N113
+
0 (1). Then we can go further and write
K =
N”3(1
+ O(N-“3))
;
this is called “pulling out the large part!’ (We will be using this trick a lot.)
Now we have
K2
=
N2’3(1
+
O(N-1’3))2
=
N2/3(l
+ O(N-‘/3)) =
N2j3
+
O(N’13)
by (9.38) and
(9.26).
Similarly
LN/KJ
=
N’P’/3(1
+ O(N-1’3))-1 + O(1)
=
N2’3(1
+ O(NP”3)) + O(1) =
N2’3
+ O(N”3).
It follows that the number of winning positions is
w
=
N2’3
+
Ol’N”3)
+ ;(N2/3 + O(N”3)) + O(N’j3) +
O(1)
ZZ
;N2’3
+ O(N”3).
(9.39)
440 ASYMPTOTICS
Notice how the 0 terms absorb one another until only one remains; this is
typical, and it illustrates why O-notation is useful in the middle of a formula.
Problem 2: Perturbation of Stirling’s formula.
Stirling’s approximation for n! is undoubtedly the most famous asymp-
totic formula of all. We will prove it later in this chapter; for now, let’s just
try to get better acquainted with its properties. We can write one version of
the approximation in the form
n! =
J&G
2
()(
e
n
l+~+~+o(n~3)
>
,
as
n-3
00,
(9.40)
for certain constants a and b. Since this holds for all large n, it must also be
asymptotically true when n is replaced by n
-
1:
(n-l)! =
dm(v)nP1
x
l+S+
(
&
+
O((n-1
lpi))
(9.41)
We know, of course, that (n
-
l)! = n!/n; hence the right-hand side of this
formula must simplify to the right-hand side of (g.ao), divided by n.
Let us therefore try to simplify (9.41). The first factor becomes tractable
if we pull out the large part:
J271(n-1) =
&(l
-np1)1’2
=
diik
(1
-
&
-
$
+
O(nP3))
Equation (9.35) has been used here.
Similarly we have
a
-
=
n-l
t
+
5
+ O(nP3)
;
b
(n
-
1
)2
=
-$(l
-n-le2
=
$+O(np3);
O((n-
l)-")
=
O(np3(1
-n-1)-3)
= O(nP3),
The only thing in (9.41) that’s slightly tricky to deal with is the factor
(n
-
l)nm
‘, which equals
n
nl -1 n-l
(1-n
1
=
nn-l
(1
-n
p')n(l
+
n-l
+ nP2 + O(nP3)) .
9.3 0 MANIPULATION 441
(We are expanding everything out until we get a relative error of O(nP3),
because the relative error of a product is the sum of the relative errors of the
individual factors. All of the O(nP3) terms will coalesce.)
In order to expand (1
-
nP’)n, we first compute ln(1
-
nP’ ) and then
form the exponential, enln(‘Pnm’l:
(1
-
nP’)n = exp(nln(1
-n-l))
=
exp(n(-nP’
-
in-’
-
in3
+ O(nP4)))
= exp(-1
-
in-’
-
in2
+ O(nP3))
= exp(-1) .
exp(-in-‘)
.
exp(-$n2)
. exp(O(nP3))
=
exp(-1) . (1
-
in-’
+
in2
+ O(nP3))
. (1
-
in2
+
O(nP4))
. (1 +
O(nP3))
=
e-l
(1
-
in-’
-
$ne2
+ O(nP3)) .
Here we use the notation expz instead of e’, since it allows us to work with
a complicated exponent on the main line of the formula instead of in the
superscript position. We must expand ln(1
-n’)
with absolute error O(ne4)
in order to end with a relative error of O(nP3), because the logarithm is being
multiplied by n.
The right-hand side of (9.41) has now been reduced to
fi
times
n+‘/e”
times a product of several factors:
(1
-
in-’
-
AnP2
+ O(nP3))
. (1 +
n-l
-t nP2 + O(nP3))
. (1
-
in-’
-
&nP2
+
O(nP3))
. (1 +
an-’
+ (a + b)nP2 + O(nP3)) .
Multiplying these out and absorbing all asymptotic terms into one O(n-3)
yields
l+an’+(a$-b-&)nP2+O(nP3).
Hmmm; we were hoping to get 1 +
an’
+
bn2
+ O(nP3), since that’s what
we need to match the right-hand side of (9.40). Has something gone awry?
No, everything is fine; Table 438 tells us that a = A, hence a + b
-
& = b.
This perturbation argument doesn’t prove the validity of Stirling’s ap-
proximation, but it does prove something: It proves that formula (9.40) can-
not be valid unless a = A. If we had replaced the O(nA3) in (9.40) by
cne3 + O(nP4) and carried out our calculations to a relative error of O(nP4),
we could have deduced that b =
A.
(This is not the easiest way to determine
the values of a and b, but it works.)
442 ASYMPTOTICS
Problem 3: The nth prime number.
Equation (9.31) is an asymptotic formula for n(n), the number of primes
that do not exceed n. If we replace n by p = P,,, the nth prime number, we
have n(p) = n; hence
as n
+
00. Let us try to “solve” this equation for p; then we will know the
approximate size of the nth prime.
The first step is to simplify the 0 term. If we divide both sides by
p/lnp,
we find that nlnp/p
+
1; hence
p/lnp
= O(n) and
O(&)
=
o(i&J
=
“(&I*
(We have (logp))’ < (logn))’ because p 3 n.)
The second step is to transpose the two sides of (g.42), except for the
0 term. This is legal because of the general rule
a
n=
b,
+O(f(n))
#
b, = a,,
+O(f(n))
.
(9.43)
(Each of these equations follows from the other if we multiply both sides
by -1 and then add a, + b, to both sides.) Hence
P
-
=
n+O(&)
= n(1 +O(l/logn)) ,
lnp
and we have
p =
nlnp(1
+ O(l/logn)) .
(9.44)
This is an “approximate recurrence” for p =
P,
in terms of itself. Our goal
is to change it into an “approximate closed form,” and we can do this by
unfolding the recurrence asymptotically. So let’s try to unfold (9.44).
By taking logarithms of both sides we deduce that
lnp = lnn+lnlnp + O(l/logn) ,
(9.45)
This value can be substituted for lnp in
(g.&,
but we would like to get rid
of all p’s on the right before making the substitution. Somewhere along the
line, that last p must disappear; we can’t get rid of it in the normal way for
recurrences, because (9.44) doesn’t specify initial conditions for small p.
One way to do the job is to start by proving the weaker result p = O(n2).
This follows if we square (9.44) and divide by pn2,
P
(lnp12
7
=
~
1 +
O(l/logn))
,
P
(
9.3 0 MANIPULATION 443
since the right side approaches zero as n
t
co. OK, we know that p = O(n2);
therefore log p = 0 (log n) and log log p = 0 (log log n). We can now conclude
from (9.45) that
lnp = Inn + O(loglogn)
;
in fact, with this new estimate in hand we can conclude that In In p = In Inn-t
0 (log log n/log n), and (9.45) now yields
lnp = Inn +
lnlnn+
O(loglogn/logn)
And we can plug this into the right-hand side of (g.44), obtaining
p = nlnn+nlnlnn+O(n).
This is the approximate size of the nth prime.
We can refine this estimate by using a better approximation of n(n) in
place of (9.42). The next term of (9.31) tells us that
Get out the scratch
proceeding as before, we obtain the recurrence
paper again, gang.
p = nlnp (1 i- (lnp)
‘)-‘(1
+
O(l/logn)‘)
,
(9.46)
which has a relative error of 0( 1 /logn)2 instead of 0( 1
/logn).
Taking loga-
rithms and retaining proper accuracy (but not too much) now yields
lnp =
lnn+lnlnp+0(1/logn)
= Inn
l+
(
lnlnp
Ann
+
O(l/logn)2)
;
lnlnn
lnlnp =
lnlnn+
Inn
+o(q$y,,
.
Finally we substitute these results into (9.47) and our answer finds its way
out:
P,
=
nlnn+nlnlnn-n+n
%+0(C).
b@)
For example, when
‘n
=
lo6
this estimate comes to 15631363.8 +
O(n/logn);
the millionth prime is actually 15485863. Exercise 21 shows that a still more
accurate approximation to
P,
results if we begin with a still more accurate
approximation to n(n) in place of (9.46).
444 ASYMPTOTICS
Problem 4: A sum from an old final exam.
When Concrete Mathematics was first taught at Stanford University dur-
ing the 1970-1971 term, students were asked for the asymptotic value of the
sum
s,
=
1 1
1
-+
n2
+ 1
-+...+-,
n2
+ 2
n2
+ n
with an absolute error of
O(n-‘).
Let’s imagine that we’ve just been given
this problem on a (take-home) final; what is our first instinctive reaction?
No, we don’t panic. Our first reaction is to THINK BIG. If we set n =
lo”‘,
say, and look at the sum, we see that it consists of n terms, each of
which is slightly less than
l/n2;
hence the sum is slightly less than l/n. In
general, we can usually get a decent start on an asymptotic problem by taking
stock of the situation and getting a ballpark estimate of the answer.
Let’s try to improve the rough estimate by pulling out the largest part
of each term. We have
1 1
-
=
n2 + k n2(1
+k/n2)
=
J
'(1-;+;-$+0(g).
and so it’s natural to try summing all these approximations:
1
11
=
---
n2 + 1 n2 n4
+$-;+o($J
1
1
-
=
n2
+2
---$+;-;+ogJ
n2
1 1
n2
+ n
n2
;4+$-$+O(-$)
s,
=
pn;l)
+...
.
It looks as if we’re getting
S,
=
n-’
-
in2
+ O(nP3), based on the sums of
the first two columns; but the calculations are getting hairy.
If we persevere in this approach, we will ultimately reach the goal; but
we won’t bother to sum the other columns, for two reasons: First, the last
column is going to give us terms that are
O(&),
when n/2 6 k 6 n, so we
will have an error of O(nP5); that’s too big, and we will have to include yet
another column in the expansion. Could the exam-giver have been so sadistic?
Do pajamas have
We suspect that there must be a better way. Second, there is indeed a much
buttons?
better way, staring us right in the face.
9.3 0 MANIPULATION 445
Namely, we know a closed form for S,: It’s just H,,z+,,
-
H,z.
And we
know a good approximation for harmonic numbers, so we just apply it twice:
Hnz+,, = ln(n2 + n)
+y
+
1 1
2(n2 + n)
-
12(n2 + n)2
+o
-$
;
(
1
H,z =
lnn2+y+&
&+O($J.
Now we can pull out large terms and simplify, as we did when looking at
Stirling’s approximation. We have
ln(n2
+n)
= inn’ +ln 1 +
i
(
>
=
lnn’+J--$+&-...;
1
11
= ----
n2
+ n
+I-...;
n2
n3
n4
1 1
-1+3-...
.
(n2
+n)2
=
iT
n5
n6
So there’s lots of helpful cancellation, and we find
plus terms that are
O(n’).
A bit of arithmetic and we’re home free:
S, = n-1
-
3-2
_
inp3 + inp4
-
&np5 +
An+
+
o(n-‘).
(9.50)
It would be nice if we could check this answer numerically, as we did
when we derived exact results in earlier chapters. Asymptotic formulas are
harder to verify; an arbitrarily large constant may be hiding in a 0 term,
so any numerical test is inconclusive. But in practice, we have no reason to
believe that an adversary is trying to trap us, so we can assume that the
unknown O-constants are reasonably small. With a pocket calculator we find
that
S4
= & + & + & + & = 0.2170107; and our asymptotic estimate when
n = 4 comes to
$(1+$(-t+
$(-;+f(f
+;(-&
+
;+))))
= 0.2170125.
If we had made an error of, say, & in the term for ne6, a difference of
h
&
would have shown up in the fifth decimal place; so our asymptotic answer is
probably correct.
446 ASYMPTOTICS
Problem 5: An infinite sum.
We turn now to an asymptotic question posed by Solomon Golomb
[122]:
What is the approximate value of
s,,=x
k>,
kNn(k)’
(9.51)
where N,(k) is the number of digits required to write k in radix n notation?
First let’s try again for a ballpark estimate. The number of digits, N,(k),
is approximately log, k = log k/log n; so the terms of this sum are roughly
(logn)‘/k(log k)‘. Summing on k gives
z
(logn)’
J&
l/k(log
k)‘, and this
sum converges to a constant value because it can be compared to the integral
.I
O”
dx 1
O”
1
~
=
2
x(lnx)2
lnx,
=ln2’
Therefore we expect
S,
to be about C(logn)‘, for some constant C.
Hand-wavy analyses like this are useful for orientation, but we need better
estimates to solve the problem. One idea is to express N,,(k) exactly:
N,(k) = Llog,kJ + 1 .
(9.52)
Thus, for example, k has three radix n digits when
n2
6 k < n3, and this
happens precisely when
Llog,
kj = 2. It follows that N,,(k) > log, k, hence
S,
=
tkal
l/kN,(k)’
<
1
+ (logn)’
&2
l/Wgk)‘.
Proceeding as in Problem 1, we can try to write N,(k) = log,, k + 0( 1)
and substitute this into the formula for
S,.
The term represented here by 0 (1)
is always between 0 and 1, and it is about
i
on the average, so it seems rather
well-behaved. But still, this isn’t a good enough approximation to tell us
about
S,;
it gives us zero significant figures (that is, high relative error) when
k is small, and these are the terms that contribute the most to the sum. We
need a different idea.
The key (as in Problem 4) is to use our manipulative skills to put the
sum into a more tractable form, before we resort to asymptotic estimates. We
can introduce a new variable of summation, m = N,(k):
[n”-’
< k <
n”‘]
=
t
km2
k,mZl
9.3 0 MANIPULATION 447
This may look worse than the sum we began with, but it’s actually a step for-
ward, because we have very good approximations for the harmonic numbers.
Still, we hold back and try to simplify some more. No need to rush into
asymptotics. Summation by parts allows us to group the terms for each value
of
HnmPi
that we need to approximate:
Sn
=
xH,k-,
($
-
&).
k21
For example, H,z
~
I
is multiplied by 1 /22 and then by -1 /32. (We have used
the fact that H,,o-, =
Ho
= 0.)
Now we’re ready to expand the harmonic numbers. Our experience with
estimating (n
-
1 )! has taught us that it will be easier to estimate
H,,k
than
H,kP1, since the (n”
-
1
)‘s
will be messy; therefore we write
HnkP, =
Hnk
--
$
= lnnk
+y+
&
+
O(h)
-
-$
=
klnn+y-&+0(A).
Our sum now reduces to
S,
=
~(klnn+y-~+o(~))($-~)
kal
Into a Big Oh.
= (1nn)tl
+yE2
-
t&(n)
+
O(t3(n2)).
(9.53)
There are four easy pieces left:
El,
X2,
Es(n), and ,Xs(n’).
Let’s do the
,Xx’s
first, since
,X3(n2)
is the 0 term; then we’ll see what
sort of error we’re getting. (There’s no sense carrying out other calculations
with perfect accuracy if they will be absorbed into a 0 anyway.) This sum is
simply a power series,
X3(x)
=
t
(j$
-
&)x-kt
k21
and the series converges when x 3 1 so we can truncate it at any desired point.
If we stop t3(n2) at the term for k = 1, we get I13(n2) = O(nP2); hence (9.53)
has an absolute error of O(ne2). (To decrease this absolute error, we could
use a better approximation to
Hnk;
but O(nP2) is good enough for now.) If
we truncate
,X3(n)
at the term for k = 2, we get
t3(n) =
in-’
+O(nP2);
this is all the accuracy we need.
448 ASYMPTOTICS
We might as well do
Ez
now, since it is so easy:
=2
=
x(&&T)
k>l
This is the telescoping series (1
-;)+(;-$)+($-&)+...
=l.
Finally,
X1
gives us the leading term of S,, the coefficient of Inn in
(9.53):
=1
=
x
k($2
-
&).
k>l
Thisis
(l-i)+(i-$)+(G-&)+...
=
$+$+$+-
=HE’
=7r2/6.
(If
we hadn’t applied summation by parts earlier, we would have seen directly
that S,
N
xk3,(lnn)/k2,
because
H,t-,
-H,tmlP1
N
Inn; so summation by
parts didn’t help us to evaluate the leading term, although it did make some
of our other work easier.)
Now we have evaluated each of the E’s in (g.53), so we can put everything
together and get the answer to
Golomb’s
problem:
S, =
glnn+,-&+0(h),
Notice that this grows more slowly than our original hand-wavy estimate of
C(logn)‘. Sometimes a discrete sum fails to obey a continuous intuition.
Problem 6: Big Phi.
Near the end of Chapter 4, we observed that the number of fractions in
the
Farey
series
3,,
is 1 +
(#J
(n) , where
O(n) =
q(l)
+(p(2)
+...+cP(n);
and we showed in (4.62) that
@(n)
=
i
1
p(k)
ln/k1
11
+ n/k1 .
k21
(9.55)
Let us now try to estimate
cD(n)
when n is large. (It was sums like this that
led Bachmann to invent O-notation in the first place.)
Thinking BIG tells us that Q(n) will probably be proportional to
n2.
For if the final factor were just
Ln/k]
instead of
11
+ n/k], we would have
(0(n)( <
i
xka,
[n/k]’ 6
i
xk>,(n/k)2 =
$n2,
because the Mobius func-
tion
p(k)
is either -1, 0, or
+l:
The additional ‘1 +
in that final factor
adds
xka,
p(k)
Ln/k]
;
but this is zero for k > n, so it cannot be more than
nH,
= O(nlog n) in absolute value.
9.3 0 MANIPULATION 449
This preliminary analysis indicates that we’ll find it advantageous to
write
‘(n)
=
;fP(k((;)
+0(1))2
=
;fp(k)((;)2+o(;))
k=l
k=l
=
;&‘i(;)l+fo(;)
k=l k=l
=
ifIdk)(E)l
+ O(nlogn)
k=l
This removes the floors; the remaining problem is to evaluate the unfloored
sum
5
x.L,
p(k)n2,/k2 with an accuracy of O(nlogn); in other words, we
want to evaluate ,Fi’=,
p(k)l/k’
with an accuracy of
O(n-’
logn). But that’s
easy; we can simply run the sum all the way up to k = cq because the newly
added terms are
k>n
T
=
O(g2)
=
O(&x.&)
k>n
=
O
(
kJA
-t,)
=
o(A).
We proved in (7.88) that
tk>,
F(k)/k’
=
l/<(z).
Hence
tk>,
k(k)/k’
=
‘/(tk>l
1 /k2) = 6/7r2, and we have our answer:
CD(n)
=
$n2
+ O(nlogn).
(9.56)
9.4
TWO ASYMPTOTIC TRICKS
Now that we have some facility with 0 manipulations, let’s look at
what we’ve done from a slightly higher perspective. Then we’ll have some
important weapons in our asymptotic arsenal, when we need to do battle
with tougher problems.
nick 1: Boots trapping.
When we estimated the nth prime
P,
in Problem 3 of Section 9.3, we
solved an asymptotic recurrence of the form
P,
=
nlnP,(l
+ O(l/logn)) .
We proved that
P,
= nln n + O(n) by first using the recurrence to show
the weaker result O(n2). This is a special case of a general method called
bootstrapping, in which we solve a recurrence asymptotically by starting with
450
ASYMPTOTIC3
a rough estimate and plugging it into the recurrence; in this way we can often
derive better and better estimates, “pulling ourselves up by our bootstraps.”
Here’s another problem that illustrates bootstrapping nicely: What is the
asymptotic value of the coefficient
g,,
=
[zn]
G(z) in the generating function
G(z)
=
exp(t
$)
,
k>l
as n
+
oo?
If we differentiate this equation with respect to z, we find
G’(z) =
F
ngnznpl =
(1
y)
G(z)
;
n=O
k>l
equating coefficients of zn-’ on both sides gives the recurrence
wh
=
O<k<n
b@)
Our problem is equivalent to finding an asymptotic formula for the solution
to (g.58), with the initial condition
go
= 1. The first few values
n 01234
5
6
gn
, ,
1
19
107
641 51103
4 36 288
2400 259200
don’t reveal much of a pattern, and the integer sequence (n!2g,) doesn’t
appear in Sloane’s Handbook
[270];
therefore a closed form for
gn
seems out
of the question, and asymptotic information is probably the best we can hope
to derive.
Our first handle on this problem is the observation that 0 <
gn
6 1 for
all n 3 0; this is easy to prove by induction. So we have a start:
9
n=
O(1)
This equation can, in fact, be used to “prime the pump” for a bootstrapping
operation: Plugging it in on the right of (9.58) yields
ng,
=
IL
O(1)
-
=
H,O(l)
= O(logn);
O<k<nn-k
\
hence we have
log n
9
on’
n=
(
>
for n > 1.
9.4 TWO ASYMPTOTIC TRICKS 451
And we can bootstrap yet again:
1
nc
I
O(U
+
logk)/k)
n=-
n+
t
n-k
O<k<n
O(logn)
=
t+o<&Cnk(n-k)
=
;
+
o<&<n(;
+
--&)
O(longn)
= k +
~H,~,O(logn)
=
kO(logn)‘,
obtaining
9
logn
2
n=
ok>
.
(9.59)
Will this go on forever? Perhaps we’ll have
g,,
=
O(n’
logn)m for all m.
Actually no; we have just reached a point of diminishing returns. The
next attempt at bootstrapping involves the sum
O<k<n
k2(n
-
k)
=
t
x
(&‘&+nz(,‘-k))
O<k<n
1
H(2)
=-
n
n-,
+
$%I
,
which is
n(n-‘);
so we cannot get an estimate for
g,,
that falls below
n(n2).
In fact, we now know enough about
g,,
to apply our old trick of pulling
out the largest part:
wh
=
t
Obk<n
gk-;tgk+;
x
k
k20
k3n
O<k<n
n-k
(9.60)
The first sum here is G(1) = exp(f +
i
+
i
+
...)
=
en2/6,
because G(z)
converges for all
Iz/
6 1. The second sum is the tail of the first; we can get an
upper bound by using (9.59):
tgk
=
o(+$)
=
o(““g,“‘2).
k2n
k>n
452 ASYMPTOTICS
This last estimate follows because, for example,
k>n
(Exercise 54 discusses a more general way to estimate such tails.)
The third sum in (9.60) is
by an argument that’s already familiar. So (9.60) proves that
p%
9
n
=
7
+
0
(log
n/n)3
Finally, we can feed this formula back into the recurrence, bootstrapping once
more; the result is
en2/b
9
n
=
7
+ O(logn/n3)
(Exercise 23 peeks inside the remaining 0 term.)
Trick
2: Trading
tails.
We derived (9.62) in somewhat the same way we derived the asymptotic
value (9.56) of O(n): In both cases we started with a finite sum but got an
asymptotic value by considering an infinite sum. We couldn’t simply get the
infinite sum by introducing 0 into the summand; we had to be careful to use
one approach when k was small and another when k was large.
Those derivations were special cases of an important three-step asymp-
(This impor-
totic summation method we will now discuss in greater generality. Whenever
tant
method
waS
we want to estimate the value of
x
k
ok
(n), we can try the following approach:
pioneered by
Lap/ace [195
‘1.)
1
First break the sum into two disjoint ranges,
D,
and
T,,.
The summation
over
D,
should be the “dominant” part, in the sense that it includes
enough terms to determine the significant digits of the sum, when n is
large. The summation over the other range
T,,
should be just the “tail”
end, which contributes little to the overall total.
2 Find an asymptotic estimate
ak(n)
=
bk(n)
+
O(ck(n))
that is valid when k
E
D,. The 0 bound need not hold when k
E
T,.
9.4 TWO ASYMPTOTIC TRICKS 153
at each of the following three sums is small:3 Now prove th
L(n)
=
xc
x
ok(n); tb(n)
=
x
‘Jk(n)
;
MT,
kET,
(n)
=
x
(ck(n)l.
(9W
If all three steps can be completed successfully, we have a good estimate:
t
ak(n)
=
t
bk(n)
+
o(L(n))
+
O(xb(n))
+
o(L(n))
.
kED,uT, kED,uT,
Here’s why. We can “chop
off”
the tail of the given sum, getting a good
estimate in the range
D,
where a good estimate is necessary:
x
ak(n)
=
x
@k(n)
+
O(ck(n)))
=
t
bk(n)
+
o&(n)).
G-D, kCD,
ND,
And we can replace the tail with another one, even though the new tail might
be a terrible approximation to the old, because the tails don’t really matter:
Asymptotics is
the art of knowing
where to be sloppy
and where to be
precise.
x
ak(n)
=
x
@k(n)
-
bk(n)
+
ak(n))
&T, MT,
=
x
h(n)
+
O(xb(n))
+
o&,(n)).
MT,
When we evaluated the sum in (g.6o), for example, we had
ak(n)
=
[06k<nlgk/(n-kk),
h(n)
=
Sk/n,
ck(n)
=
kgk/n(n-k);
the ranges of summation were
D,
=
{O,l,...,
n-l},
T,,
=
{n,n+l,...};
and we found that
x,(n) =
0,
Lb(n)
= o((logn)2/n2), xc(n) = o((logn)3/n2).
This led to (9.61).
Similarly, when we estimated
0(n)
in (9.55) we had
ak(n)
=
v(k)
[n/k]
Ll+n/k]
,
bk(n)
=
dk)n2/k2
,
ck(n)
=
n/k;
D,
= {1,2
,...,
n},
T,,
=
{n+l,n+2,...}.
We derived (9.56) by observing that
E,(n)
= 0, xb(n) = O(n), and L,(n) =
O(nlogn).
454 ASYMPTOTICS
Here’s another example where tail switching is effective. (Unlike our
Also, horses switch
previous examples, this one illustrates the trick in its full generality, with
their
MS
when
,X,(n) # 0.) We seek the asymptotic value of
feeding time ap-
proaches.
The big contributions to this sum occur when k is small, because of the k! in
the denominator. In this range we have
ln(n+2k)
=
lnn+c-2+0(s)
b64
We can prove that this estimate holds for 0 6 k <
Llg
n] , since the original
terms that have been truncated with 0 are bounded by the convergent series
2km
t-
23k
mnm
zr
-...
m33
n3
(In this range, 2”/n 6 2L1snlP1/n 6
i.)
Therefore we can apply the three-step method just described, with
ok(n) = ln(n + 2k)/k! ,
bk(n) = (lnn + 2”/n
-
4k/2n2)/k!,
ck(n) = gk/n3k!;
D,
=
{O,l,...,
[lgn]
-l},
T,,
= {LlgnJ,[lgnj
+l,...}.
All we have to do is find good bounds on the three
t’s
in (g.63), and we’ll
know
that
tk>(,
ak(n)
=
tk>‘,
bk(n).
The error we have committed in the dominant part of the sum, L,(n) =
t
keD, gk/n3k!, is obviously bounded by tk>O gk/n3k! = e8/n3, so it can be
replaced by O(nP3). The new tail error is
<
IL
lnn+2k+4k
k>
Llg
n]
k!
lnn+2lknJ
+4llsnl
<
lk
nl
!
9.4 TWO ASYMPTOTIC TRICKS 455
“We may not be big,
Since Llgnj ! grows faster than any power of n, this minuscule error is
over-
but we’re small.”
whelmed by X,(n)
==
O(nP3). The error that comes from the original tail,
is smaller yet.
Finally, it’s easy to sum
t
k20
bk(n) in closed form, and we have obtained
the desired asymptotic formula:
ln(n + 2k)
t
k’
e2
e4
k20
=
elnnt---+0
n
The method we’ve used makes it clear that, in fact,
k20
ln(n
+
2k)
k!
m-1
=
elnn+
~(-l)k+l&+O(-&)j
k=l
(w%)
(9.66)
for any fixed m > 0. (This is a truncation of a series that diverges for all
fixed n if we let m -+ co.)
There’s only one flaw in our solution: We were too cautious. We de-
rived (9.64) on the
iassumption
that k < [lgn], but exercise 53 proves that
the stated estimate is actually valid for all values of k. If we had known
the stronger general result, we wouldn’t have had to use the two-tail trick;
we could have gone directly to the final formula! But later we’ll encounter
problems where exchange of tails is the only decent approach available.
9.5 EULER’S SUMMATION FORMULA
And now for our next trick-which is, in fact, the last important
technique that will be discussed in this book-we turn to a general method of
approximating sums that was first published by Leonhard Euler
[82]
in 1732.
(The idea is sometimes also associated with the name of Colin Maclaurin,
a professor of mathematics at Edinburgh who discovered it independently a
short time later
[211,
page
3051.)
Here’s the formula:
x
f(k)
=
1”
f(x)dx
+
L$f+‘)(x)ib
+ R,,
b67)
a<k<b
(1
a
where
R,
=
(--l)m+’
s
b
&(1x))
~
m!
fcmi(x) dx
,
integers a < b; (g 68)
a
integer m 3 1.
456 ASYMPTOTICS
On the left is a typical sum that we might want to evaluate. On the right is
another expression for that sum, involving integrals and derivatives. If f(x) is
a sufficiently “smooth” function, it will have m derivatives f’(x), . . . ,
f(“)
(x),
and this formula turns out to be an identity. The right-hand side is often an
excellent approximation to the sum on the left, in the sense that the remain-
der
R,
is often small. For example, we’ll see that Stirling’s approximation
for n! is a consequence of Euler’s summation formula; so is our asymptotic
approximation for the harmonic number H,.
The numbers
Bk
in (9.67) are the Bernoulli numbers that we met in
Chapter 6; the function B,({x}) in (9.68) is the Bernoulli polynomial that we
met in Chapter 7. The notation {x} stands for the fractional part x
-
Lx],
as
in Chapter 3. Euler’s summation formula sort of brings everything together.
Let’s recall the values of small Bernoulli numbers, since it’s always handy
to have them listed near Euler’s general formula:
B.
=
1,
B, =
-5,
Bz
=
;,
B4
=
-&-,
,
B6
= &,
Ba
=
-$,;
B3
= Bs =
B,
=
B9
= B,, = . . . =
0.
Jakob Bernoulli discovered these numbers when studying the sums of powers
of integers, and Euler’s formula explains why: If we set f(x) = x”-‘ , we have
f’“‘(x)
= 0; hence
R
,,,
= 0, and (9.67) reduces to
aSk<b
Bk. (bm-k
-
ampk)
.
For example, when m = 3 we have our favorite example of summation:
x
k2
=
i((i)Bon3+(f)Bln’+(z)B2n)
=
T--T+:
OSk<n
(This is the last time we shall derive this famous formula in this book.) All
good things
Before we prove Euler’s formula, let’s look at a high-level reason (due
to Lagrange
[192])
why such a formula ought to exist. Chapter 2 defines the
~nu~~c’me
t0
difference operator A and explains that
x
is the inverse of A, just as J is the
inverse of the derivative operator D. We can express A in terms of D using
Taylor’s formula as follows:
f(X
+ E)
=
f(x) +
ye
+
T2
+.
. . .
9.5 EULER’S SUMMATION FORMULA 457
Setting
E
= 1 tells us that
Af(x) = f(x+ 1) -f(x)
=
f/(x)/l!
+ f”(X)/2! + f”‘(X)/3! +
“.
=
(D/l!+D2/2!+D3/3!+...)f(x)
=
(eD-l)f(x).
b69)
Here
eD
stands for the differential operation 1 + D/l ! + D2/2! + D3/3! + . . . .
Since A =
eD
-
1,
the inverse operator
t
= l/A should be l/(eD
-
1); and
we know from Table 337 that z/(e’
-
1) =
&c
Bk.zk/k!
is a power series
involving Bernoulli numbers. Thus
1
=
~+!?+$D+$D~+...
=
J+&$Dkpl.
(9.70)
Applying this operator equation to f(x) and attaching limits yields
~;f(x)hx
=
Jb
f(x) dx +
x
sf’kpl’(x)
b
a
k>l
k!
a'
(9.71)
which is exactly Euler’s summation formula (9.67) without the remainder
term. (Euler did not, in fact, consider the remainder, nor did anybody else
until S. D. Poisson
[:236]
published an important memoir about approximate
summation in 1823. The remainder term is important, because the infinite
sum
xk>,
(Bk/k!)fCk--‘)(x)li often diverges. Our derivation of (9.71) has been
purely formal, without regard to convergence.)
Now let’s prove (g.67),
with the remainder included. It suffices to prove
the case a = 0 and b =
1,
namely
J
1
f(0) = f(x) tix +
f
3
wyx)~’
-
(-l)m
J
-
Bm(x)
(ml
0
k=,
k!
o 0
m! f (x)
dx
because we can then replace f(x) by f (x +
1)
for any integer 1, getting
J
lfl
f(l) =
1
f(x)dx+f
~f(kpl)(x)lL+‘-
(-l)m
J
k=,
k!
1
I+’
Bm()’
f’“’
(x)
dx
~
1
The general formula (9.67) is just the sum of this identity over the range
a 6
1
< b, because intermediate terms telescope nicely.
The proof when a = 0 and b = 1 is by induction on m, starting with
m= 1:
f(0) =
J’f(x)dx-f(f(l)-f(o))+J’(x-;)f’(x)dx.
0
0
458 ASYMPTOTICS
(The Bernoulli polynomial E&(x) is defined by the equation
B,(x)
=
(y)Boxm+
(~)B,x~-’
+...+
(~)B,x~
(9.72)
in general, hence
Br
(x) = x
-
i
in particular.) In other words, we want to
prove that
f(O) +
f(l)
2
=
/;f(x)dx+l:jx-;)f’lx)dx.
But this is just a special case of the formula
J
1
J
1
u(xMx)
11,
=
u(x) dv(x) +
4x1
du(x)
(9.73)
0 0
for integration by parts, with u(x) = f(x) and v(x) = x
-
i.
Hence the case
n
= 1 is easy.
To pass from m
-
1 to m and complete the induction when m >
1,
we
need to show that
R,-l
= (B,/m!)f(mP1’(~)l~ +
R,,
namely that
This reduces to the equation
(-l)mBmf(mpli (x)1’
0
= m
J’B,-
(x)Grnp’)(x)
dx
+
JIB,,,(xlGml(x)
dx.
0 0
Once again (9.73) applies to these two integrals, with u(x) = f(“-
‘l(x)
and
Will the authors
v(x) = B,(x), because the
d.erivative
of the Bernoulli polynomial (9.72) is
never get serious?
=
mB,-l(x).
(9.74)
(The absorption identity (5.7) was useful here.) Therefore the required for-
mula will hold if and only if
(-l)“‘B,,,f(“~‘)
(x)1;
=
B,(x)f’mpl)(x)l;.
9.5 EULER’S SUMMATION FORMULA 459
In other words, we need to have
(-l)mBm
=
B,,(l)
=
B,(O),
for
m
>
1.
(9.75)
This is a bit embarrassing, because B,(O) is obviously equal to
B,,
not
to
(-l)mB,.
But there’s no problem really, because m > 1; we know that
B,
is zero when m is odd. (Still, that was a close call.)
To complete the proof of Euler’s summation formula we need to show
that B,,,(l) = B,(O), which is the same as saying that
for m
>
1.
But this is just the definition of Bernoulli numbers,
(6.7g),
so we’re done.
The identity
B&(x)
= mBm-l (x) implies that
s
1
Bm(x) dx =
B
,+1(l)
-
Bm+l(O)
,
0
m+l
and we know now that this integral is zero when m 3
1.
Hence the remainder
term in Euler’s formula,
R,
=
(-‘);+’
i”
Bm((x))f(“‘)(x)
dx,
m.
a
multiplies f’“)(x) by a function
B,
({x}) whose average value is zero. This
means that
R,
has a reasonable chance of being small.
Let’s look more closely at B,(x) for 0 6 x 6 1, since B,(x) governs the
behavior of R,.
Here are the graphs for
B,(x)
for the first twelve values of m:
m
:=
1
m=2 m=3
Bm(x)
/
W-
B
4+m(X)
-
-
-
BS+m(X)
-
--
m=4
24
Although BJ (x) through
Bg(x)
are quite small, the Bernoulli polynomials
and numbers ultimately get quite large. Fortunately
R,
has a compensating
factor 1
/m!,
which helps to calm things down.
460 ASYMPTOTICS
The graph of B,(x) begins to look very much like a sine wave when
m > 3; exercise 58 proves that B,(x) can in fact be well approximated by a
negative multiple of
cos(27rx
-
inm),
with relative error
l/2”.
In general,
Bdk+l
(x) is negative for 0 < x <
i
and positive for
i
< x < 1.
Therefore its integral,
Bdk+~
(x)/(4k+2),
decreases for 0 < x <
5
and increases
for
i
< x < 1. Moreover, we have
bk+l(l
-
X)
=
-&+I
(X)
,
for 0 < x <
1,
and it follows that
bk+2(1
-X)
=
bk+2(x),
for 0 < x <
1.
The constant term
Bdk+2
causes the integral
sd
l&k+l(x) dx to be zero; hence
B4k+2
> 0. The integral of
Bak+Z(X)
is Bdk+A(x)/(4k+3), which must therefore
be positive when 0 < x <
5
and negative when i < x < 1; furthermore
B4k+3
( 1
-
x) = -B4k+3 (x) , so B4k+j (x) has the properties stated
for
i&k+1 (x),
but negated. Therefore
B4k
+4(x) has the properties stated for BJ~+z(x), but
negated. Therefore B4k+s(x) has the properties stated for
B4k+l
(x); we have
completed a cycle that establishes the stated properties inductively for all k.
According to this analysis, the maximum value of Blm(x) must occur
either at x = 0 or at x =
i.
Exercise 17 proves that
BZm(;) =
(21mm2”’
-
1)‘B2,,,;
(9.76)
hence we have
(bn(b4I
6
IBzml.
(9.77)
This can be used to establish. a useful upper bound on the remainder in Euler’s
summation formula, because we know from (6.89) that
IBZllJ
(2m)!
when m > 0.
Therefore we can rewrite Euler’s formula (9.67) as follows:
x
f(k)
=
J”
a<k<b
a
+-
o((2n)~~~)
Jbpyxq
dx.
(9.78)
a
For example, if f(x) = ex, all derivatives are the same and this formula tells
us that
taSkCb
ek
=
(eb
-
ea)(l
-
i
+ B2/2!
+
B4/4! +
...
+
B&(2m)!)
+
9.5 EULER’S SUMMATION FORMULA 461
0((2n)-2”). Of
course, we know that this sum is actually a geometric series,
equal to
(eb
-
e”)/(e
-
1) =
(eb
-
ea)
xkSO
Bk/k!.
If
f(2m)(x)
3
0
for a < x < b, the integral
Ji
lf(2")(x)l
dx is just
f(2m-1)(x)li,
so we have
B
‘R2m’
G
(237q!
2!.?+yX)~~
1
;
in other words, the remainder is bounded by the magnitude of the final term
(the term just before the remainder), in this case. We can give an even better
estimate if we know that
f(2m+2)(x)
> 0 and
f(2m+41(x)
3
0,
for a 6 x 6 b.
(g-79)
For it turns out that this implies the relation
B2m+2
R2,,,
=
8,---
(2m +
2)!
f(2m+')(x)l;,
forsomeO<0,<1;
b8o)
in other words, the remainder will then lie between 0 and the first discarded
term in
(9.78)
-the term that would follow the final term if we increased m.
Here’s the proof: Euler’s summation formula is valid for all m, and
Bz,,,+I
= 0 when m > 0; hence
Rz,,,
=
Rz,+,,
and the first discarded term
must be
R
2m
-
R2,+2.
We therefore want to show that
Rzm
lies between 0 and
R2m
-
Rzm+2;
and
this is true if and only if
Rz,,
and
R2,,,+2
have opposite signs. We claim that
f(Zm+2)(x)
2:
0 for a < x 6 b implies (-l)“‘Rz,,, 3 0.
(9.81)
This, together with
(g.Tg),
will prove that R
z,,,
and
R~,,,+z
have opposite signs,
so the proof of (9.80) will be complete.
It’s not difficult to prove
(9.81)
if we recall the definition of
Rzm+l
and
the facts we proved about the graph of
Bzm+l
(x). Namely, we have
R2m = R2,.+1 =
s
b
B2m+1
6”)
f(h+l)
(x)
dx
,
a
(2m+ l)!
and
f(2m+')(x)
is increasing because its derivative
f12m+2)(~)
is positive. (More
precisely,
f
(2m+'
'
(x) is nondecreasing because its derivative is nonnegative.)
The graph of
Bz,,,+r
({x}) looks like (-1)
m+’
times a sine wave, so it is geo-
metrically obvious that the second half of each sine wave is more influential
than the first half when it is multiplied by an increasing function. This makes
(-l)mR2,+~
,>
0, as desired. Exercise 16 proves the result formally.
462 ASYMPTOTICS
9.6 FINAL SUM:MATIONS
Now comes the summing up, as we prepare to conclude this book.
We will apply Euler’s summation formula to some interesting and important
examples.
Summation
1:
This one jis too easy.
But first we will consider an interesting unimportant example, namely
a sum that we already know how to do. Let’s see what Euler’s summation
formula tells us if we apply
i.t
to the telescoping sum
sn=
EL=:
l<k<nkck+')
U
11
---
I<k<n
k
k+l
=
l-1
n’
\ \
It can’t hurt to embark on our first serious application of Euler’s formula with
the asymptotic equivalent of training wheels.
We might as well start by writing the function f(x) = 1 /x(x+ 1) in partial
fraction form,
f(x) =
;
-
--$
since this makes it easier to integrate and differentiate. Indeed, we have
f'(x) =
-l/x2
+ l/(x +
1)2
and f"(x) =
2/x3
-2/(x +
l)3;
in general
f(k)(,)
=
(-l)“k’
-!w
-
Jxk+I
(x+l)kf'
for k > 0.
Furthermore
s
n
1
f(x)dx
=
lnx-ln(x+l))y
=
ln--$.
Plugging this into the summation formula (9.67) gives
S,
= In
$
-
&,)k!f
v$-
k=l
(
(n+l)k
-1
+$
+R,(n),
>
wheel
=
-/~B,,({xl)(&-
(x+~)m+,)dx.
For example, the right-hand side when m = 4 is
ln
2n 1 1 1 1
(
---.--
>
1 1 1
3
~--
--
--~--
n+l
2n
n-t1
2
12
(
n2
(n+1)2
4
>
1
1
1
15
+
120
(
;;7
-
(n
-
16
>
+
R4(n)
.
9.6 FINAL SUMMATIONS 463
This is kind of a mess; it certainly doesn’t look like the real answer 1
-
n-l.
But let’s keep going anyway, to see what we’ve got. We know how to expand
the right-hand terms in negative powers of n up to, say,
O(n5):
In n
-
=
-n-l
+
Lgp2
-
in-3
-+
$np4 + o(ne5)
;
n+l
1
-=
n
n+l
-1
_
,-2
+
np3
--
n+
+ O(np5);
1
___
zz
(n-t
1)2
nm2
-
2n
~3
+ 3n
m4
+ O(np5)
;
np4
+
O(n5)
Therefore the terms on the right of our approximation add up to
ln2 + t + &
-
j$
+ (-1
-
t
+
t)n-’
+
(+
-
t
-
& + &)np2
+(-+-&)n-3
+
(1;
-
i
+ & +
&
-
&)nm4
+ R4(n)
=
ln2+$$-n-‘+R4(n)+O(n~5).
The coefficients of ne2, np3, and
nm4
cancel nicely, as they should.
If all were well with the world, we would be able to show that
R,+(n)
is
asymptotically small, maybe
O(n5),
and we would have an approximation
to the sum. But we can’t possibly show this, because we happen to know that
the correct constant term is
1,
not ln2 +
s
(which is approximately 0.9978).
So R4(n) is actually equal to
G
-
ln2 +
O(n4),
but Euler’s summation
formula doesn’t tell us this.
In other words, we lose.
One way to try fixing things is to notice that the constant terms in the
approximation form a. pattern, if we let m get larger and larger:
ln2-tB,+t.~B2-~.~B3+~.~B4--.~B5+...
Perhaps we can show that this series approaches 1 as the number of terms
becomes infinite? But no; the Bernoulli numbers get very large. For example,
f322
=
@$$
> 6192; therefore
IR22(n)l
will be much larger than 1
R4
(n)
1.
We lose totally.
There is a way out, however, and this escape route will turn out to be
important in other applications of Euler’s formula. The key is to notice that
R4(n) approaches a definite limit as n
+
00:
lim R4(n) =
-jyB4({x))(&&)
dx
=
R4(m)
n-02
464 ASYMPTOTICS
The integral
JT
B4({x})f’“:(x) dx will exist whenever
f’“‘(x)
=
0(x
‘)
as
x
+
00, and in this case
f14)
(x) surely qualifies. Moreover, we have
R4(n) =
Rl(m!+~~Bl({x:)(~-~)dn
=
R4(00)
+
O(/“xp6
)dx =
R~(cw)
+ O(nP5).
n
Thus we have used Euler’s summation formula to prove that
= ln2 -t
s
-n-l
+ R4(00) +
O(n5)
= C
-
I-I-’
+ O(ne5)
for some constant C. We do not know what the constant is-some other
method must be used to establish it -but Euler’s summation formula is able
to let us deduce that the
co,nstant
exists.
Suppose we had chosen a much larger value of m. Then the same rea-
soning would tell us that
R,(n) = R,(m) + O(nPmP’),
and we would have the formula
t
1
~
=
1
<k<n
k(k+l
)
C
-
T-C’
+ c2nP2 + cjnP3 + . . . +
c,n~~
m
+ O(nPmP’ )
,
for certain constants
~2,
~3,
. . . . We know that the c’s happen to be zero
in this case; but let’s prove it, just to restore some of our confidence (in
Euler’s formula if not in ourselves). The term In
*
contributes (-1
)“/m
to cm; the term
(-l)m+’
(Bm/m)nPm contributes
(-l)“+‘B,/m;
and the
term
(-l)k(Bk/k)(n+
l)pk contributes
(-l)m(F::)BJk.
Therefore
(-‘)ym
=
A-%
+f
(-1);
k=l
k-1
-1
Bm
m m
Bk
=
;(I-B,+B,(l)-1).
Sure enough, it’s zero, when m > 1. We have proved that
= C
-n-l
+ O(nPmP’),
for all m > 1 .
(9.82)
This is not enough to prove that the sum is exactly equal to C
-
n
;
the
actual value may be C
-
n’
+
2-”
or something. But Euler’s summation
9.6 FINAL SUMMATIONS 465
formula does give us
O(n
mP1
) for arbitrarily large m, even though we haven’t
evaluated any remainders explicitly.
Summation 1, again: Recapitulation and generalization.
Before we leave our training wheels, let’s review what we just did from
a somewhat higher perspective. We began with a sum
S, =
x
f(k)
l<k<n
and we used Euler’s
isummation
formula to write
S, =
F(n)
-
F(‘I
1-t
c(Tkin)
-
Tk(l
I)
+
R,(n),
k=l
(9.83)
where F(x) was
j
f(x) dx and where
Tk
(x)
was a certain term involving Bk and
f(kmm’)(~).
We also noticed that there was a constant c such that
fcm)(x)
=
0(x’-“)
as x
+
00,
for all large m.
(Namely, f(k) was
l/k(k+
1); F(x) was ln(x/(x+ 1));
Tk(x)
was (-l)k+’ x
(Bk/k)(x-k
-
(x
+ 1)-k);
and c was -2.) For all large enough values of m,
this implied that the remainders had a small tail,
R,!,,(n) = R,(Lx)
-
R,(n)
zz
(-y+’
s
O”
Bm(bl)
,f’““(x)
dx =
O(nc+‘Pm).
(9.84)
n
Therefore we were able to conclude that there exists a constant C such that
S, = F(n) + C
-t
f
Tk(n)
-
R,/,,(n).
k=l
(9.85)
(Notice that C nicely absorbed the
Tk(
1) terms, which were a nuisance.)
We can save ourselves unnecessary work in future problems by simply
asserting the
existenc.e
of C whenever R,,(m) exists.
Now let’s suppose that
f(2m+21(x)
3 0 and
f(2m+4)(~)
3 0 for 1 6 x 6 n.
We have proved that this implies a simple bound (9.80) on the remainder,
b,(n)
=
%,,,(Tzm+2(n)
-
Tzm+2(l
1)
,
where
8,,,
lies somewhere between 0 and
1.
But we don’t really want bounds
that involve Rz,(n) and
T2,,,+2(
1); after all, we got rid of
Tk(
1) when we
introduced the constant C. What we really want is a bound like
-%,,(n)
=
hnTzm+2(n),
466
ASYMPTOTIC3
where 0 <
a,,,,
< 1; this will allow us to conclude from (9.85) that
S, = F(n) +
C
+
T
(n)
+
f
Tzk(n)
+
Gm,nT2m+2(n)
,
k=l
(9.86)
hence the remainder will truly be between zero and the first discarded term.
A slight modification of our previous argument will patch things up per-
fectly. Let us assume that
f(2m+2'(x)
3
0 and
fc2m+4)(x)
3
0, as x
+
0~).
(9.87)
The right-hand side of (9.85) is just like the negative of the right-hand side of
Euler’s summation formula (9.67) with a = n and b = 00, as far as remainder
terms are concerned, and successive remainders are generated by induction
on m. Therefore our previous argument can be applied.
Summation 2: Harmonic numbers harmonized.
Now that we’ve learned so much from a trivial (but safe) example, we can
readily do a nontrivial one. Let us use Euler’s summation formula to derive
the approximation for
H,
that we have been claiming for some time.
In this case, f(x) = l/x. We already know about the integral and deriva-
tives of f, because of Summation 1; also
f(ml(x)
=
O(xpmp')
as x
+
00.
Therefore we can immediately plug into formula (9.85):
l<k<n
m
bk
Inn + C +
Bin-’
-
x
2kn2k
-
R&(n),
k=l
for some constant C. The
;sum
on the left is Hn-lr not
H,;
but it’s more
convenient to work with H,-~l and to add 1
/n
later, than to mess around with
(n +
1)'s
on the right-hand side. The
Bin-l
will then become (B, + 1
)n-'
=
1/(2n). Let us call the constant y instead of C, since Euler’s constant y is,
in fact, defined to be
lim,,,,
(H,
-
Inn).
The remainder term can be estimated nicely by the theory we developed
a minute ago, because
f(2")(x)
= (2m)!/x2”‘+’ 3 0 for all x > 0. Therefore
(9.86) tells us that
H,
=
(9.88)
where
0,,,
is some fraction between 0 and
1.
This is the general formula
whose first few terms are listed in Table 438. For example, when m = 2 we get
11
1
H,
=
lnn+y+K-m
+ 120n4
02 n
-252n6*
(9.89)
9.6 FINAL SUMMATIONS 467
This equation, incidentally, gives us a good approximation to y even when
n = 2:
y =
Hz-ln2--i
+&-&,+e
=
0.577165...+~,
where
E
is between zero and
&.
If we take n =
lo4
and m = 250, we get
the value of y correct to 1271 decimal places, beginning thus
[171]:
y =
0.57721566490153286060651209008240243...
.
(9.90)
But Euler’s constant appears also in other formulas that allow it to be eval-
uated even more efficiently
[282].
Summation 3: Stirling’s approximation.
If f(x) = In x, we have f'(x) = 1
/x,
so we can evaluate the sum of
logarithms using almost the same calculations as we did when summing re-
ciprocals. Euler’s summation formula yields
t
1
$k<n
Ink =
nl.nn-n+o-F
B2k
B2m+2
k=,
2k(2k-l)nzk-’
(Pm’n
(2m+2)(2m+l)n2m+l
where u is a certain constant, “Stirling’s constant,” and 0 <
(P~,~
<
1.
(In this
case
f(2")(x)
is negative, not positive; but we can still say that the remainder
is governed by the first discarded term, because we could have started with
f(x) =
-In
x instead of f(x) = lnx.) Adding Inn to both sides gives
1
(P2,n
Inn! =
nlnn--n+F+o+&--
___
360n3
1260n5
(9.91)
Heisenberg may
have been here.
when m = 2. And we can get the approximation in Table 438 by taking ‘exp’
of both sides. (The value of
ev
turns out to be
fi,
but we aren’t quite ready
to derive that formula. In fact, Stirling didn’t discover the closed form for
IS
until several years after de Moivre
[64]
had proved that the constant exists.)
If m is fixed and n
+
00, the general formula gives a better and better
approximation to Inn! in the sense of absolute error, hence it gives a better
and better approximation to n! in the sense of relative error. But if n is fixed
and m increases, the error bound
IB2,+21/(2m
+ 2)(2m + 1
)n2”‘+’
decreases
to a certain point
anld
then begins to increase. Therefore the approximation
reaches a point beyond which a sort of uncertainty principle limits the amount
by which n! can be approximated.
468 ASYMPTOTICS
In Chapter 5, equation
(5.83),
we generalized factorials to arbitrary real
OL
by using a definition
1
_
=
lim
n
+
O1
Nina
a!
(
1
n-+m
n
suggested by Euler. Suppose a is a large number; then
lncx!
=
Ji+mJalnn+lnn!-fln(a+k)),
k=l
and Euler’s summation formula can be used with f(x) = ln(x+ a) to estimate
this sum:
ln(k+ a) =
F,(a,n)
-
F,(a,O) +
Rz,,,(a,n)
,
k=l
ln(x + a)
F,(a,x)
=
(x+a)ln(x+cx-xf
2
B2k
k=,
2k(2k
-
1 )(x +
a)2kp1
J
n
Bz,n(IxI)
dx
RI,,,(a,n) = --
0
.2m
(x +
a)2m
(Here we have used
(9.67)
with a = 0 and b = n, then added ln(n + a)
-
lna
to both sides.) If we subtract this approximation for xE=, ln(k + a)
from Stirling’s approximation for Inn!, then add alnn and take the limit as
n
+
00,
we get
lna! =
alna-a+lnf+o
B2k
m
B2m(b>)
dx
+fm-l)a2kP1
o
k=l
J
2m (x+
OoZrn
because
alnn+nlnn-n+i.lnn-(n+a)
ln(nt-a)+n-i
ln(n+a)
+
-a and
the other terms not shown ‘here tend to zero. Thus Stirling’s approximation
behaves for generalized factorials (and for the Gamma function
r(
a + 1) = a!)
exactly as for ordinary factorials.
Summation 4: A bell-shaped summand.
Let’s turn now to a
sum
that has quite a different flavor:
(9.92)
. +
e-9/n
+
e-4/n
+
e
l/n
+
,
+
e-l/n
+
e
-4/n +
e-9/n
+.
. . .
9.6 FINAL SUMMATIONS 469
This is a doubly infinite sum, whose terms reach their maximum value
e”
= 1
when k = 0. We
Cal:1
it
0,
because it is a power series involving the quantity
eel/” raised to the
p
(k)th power, where p(k) is a polynomial of degree 2; such
power series are traditionally called “theta functions!’ If n =
1O1oo,
we have
e
k2/n
-
e-
01
M
0.99005,
when k =
1049;
-
ec’
z
0.36788,
when k = 105’;
e-lOO
<
10P43,
when k = 105’.
So the summand
stays
very near 1 until k gets up to about
fi,
when it
drops off and stays very near zero. We can guess that
0,
will be proportional
to
fi.
Here is a graph of eekzin when n = 10:
Larger values of n just stretch the graph horizontally by a factor of
$7.
We can estimate
0,
by letting f(x) = eex2/” and taking a =
-00,
b =
$00 in Euler’s summation formula. (If infinities seem too scary, let a = -A
and b = +B, then take limits as A, B
+
00.) The integral of f(x) is
if we replace x by
u.fi.
The value of
s,”
eeU2
du is well known, but we’ll
call it C for now and come back to it after we have finished plugging into
Euler’s summation formula.
The next thing we need to know is the sequence of derivatives f’(x),
f"(X), . . . .
and for this purpose it’s convenient to set
f(x)
=
s(x/Jq
,
g(x) =
epK2
Then the chain rule of calculus says that
df(x)
Q(y)
dy
-
=
---
dx
dy dx
y = -If_;
fi
and this is the same as saying that
f'(x) =
5
g'(x/fi).
By induction we have
fck'(x)
=
nPk’2g(k1(x/fi).
470 ASYMPTOTICS
For example, we have g’(x) = -2xePXL and g”(x) = (4x2
-2)eX2;
hence
f’(x) =
1
4x
,,-x2/n
)
(
>
fi
fi-
f”(X) q =
;(4(+)2
m2)e&.
It’s easier to see what’s going on if we work with the simpler function g(x).
We don’t have to evaluate the derivatives of g(x) exactly, because we’re
only going to be concerned about the limiting values when x =
foe.
And for
this purpose it suffices to notice that every derivative of g(x) is
ex2
times a
polynomial in x:
g(k)(x) =
Pk(X)CX2
)
where
Pk
is a polynomial of degree k.
This follows by induction.
The negative exponential
e-”
goes to zero much faster than Pk(x) goes
to infinity, when x
t
!~o3,
so we have
fy+m)
= f’k’(-co) = 0
for all k 3 0. Therefore all of the terms
vanish, and we are left with the term from J f(x) dx and the remainder:
=
cfi
+
o(,(‘-m.)/2)
The 0 estimate here follows since
IB,({ufi})
1 is bounded and the integral
ST,”
lP(u)
lePU2
du exists whenever P is a polynomial. (The constant implied
by this 0 depends on m.)
We have proved that
0,
=
Cfi
+
O(n”),
for arbitrarily large M; the
difference between
0,
and
Cfi
is “exponentially small!’ Let us therefore
determine the constant C that plays such a big role in the value of 0,.
One way to determine C is to look the integral up in a table; but we
prefer to know how the value can be derived, so that we can do integrals even
9.6 FINAL SUMMATIONS 471
when they haven’t been tabulated. Elementary calculus suffices to evaluate C
if we are clever enough to look at the double integral
+CC
C2
=
J
epxz
dx
-M
J
+CC
e-y’
dy =
+m
+a2
J J
e-(X'+Yz)
dx
dy.
-0c)
-03
-00
Converting to polar coordinates gives
2n
co
c2
=
J
J
e
-“T dr
d0
0 0
12"
00
EZ-
J J
2
0
d0 epu
du
0
1
2x
=-
J
2
0
d0
=
rr.
(u =
l-2)
So C =
,/%.
The fact that
x2
+
y2
=
r2
is the equation of a circle whose
circumference is
27rr
somehow explains why
rr
gets into the act.
Another way to evaluate C is to replace x by
fi
and dx by
itt’/2
dt:
J
+oO
C
=
epx2
dx
=
2
epxz
dx
=
J”
t-‘/+t
dt
-DC,
0
This integral equals
r(i),
since
I
==
jr
taP’ePt dt according to (5.84).
Therefore we have demonstrated that
r(i)
=
,,/?r.
Our final formula, then, is
0,
=
x
eCkzin
=
&iii+
O(neM) ,
for all fixed M.
(9.93)
k
The constant in the 0 depends on M; that’s why we say that M is “fixed!’
When n = 2, for example, the infinite sum 02 is equal to 2.506628288;
this is already an excellent approximation to
fi
= 2.506628275, even though
n is quite small. The value of
@loo
agrees with
1Ofi
to 427 decimal places!
Exercise 59 uses
ad.vanced
methods to derive a rapidly convergent series
for
0,;
it turns out
,that
@,/&ii
= 1
-I-
2eCnnz
+ 0(
eC4nnL
) .
(9.94)
Summation
5: The clincher.
Now we will do one last sum, which will turn out to tell us the value
of Stirling’s constant
cr.
This last sum also illustrates many of the other
techniques of this last chapter (and of this whole book), so it will be a fitting
way for us to conclude our explorations of Concrete Mathematics.
472 ASYMPTOTICS
The final task seems almost absurdly easy: We will try to find the asymp-
totic value of
by using Euler’s summation. formula.
This is another case where we already know the answer (right?); but
it’s always interesting to try new methods on old problems, so that we can
compare facts and maybe discover something new.
So we THINK BIG and realize that the main contribution to A, comes
from the middle terms, near k = n. It’s almost always a good idea to choose
notation so that the biggest contribution to a sum occurs near k = 0, because
we can then use the tail-exchange trick to get rid of terms that have large
Ikl.
Therefore we replace k by
n.
+ k:
An
=
x
n2;k
=
x
(
>
(2n)!
k
k
(n+k)!(n-k)!’
Things are looking reasonably good, since we know to approximate (n
f
k)!
when n is large and k is small.
Now we want to carry out the three-step procedure associated with the
tail-exchange trick. Namely, we want to write
(2n)!
(n+k)!(n-k)!
=
ak.(n)
=
bk(n)
+
O(Ck(n))
,
for k
E
D,,
so that we can obtain the estimate
An
=
tbk(n)
f
O(
x
ak(n))
f
O(
x
bk(n))
+
t
O(Ck(n))
.
k
k@D,
W’Jn
kED,
Let us therefore try to estimate
(
:Tk) in the region where
Ikl
is small. We
could use Stirling’s approximation as it appears in Table 438, but it’s easier
to work with the logarithmic equivalent in (9.91):
ln ak(n) = ln(2n)!
-
ln(n
+ k)!
-
ln(n
-
k)!
=
2nln2n-2n+$ln2n+o+O(nP’)
-
(n+k) ln(n+k) + n + k
-
i
ln(n+k)
-
o + 0(
(n+k)-‘)
-
(n-k)
ln(n-k)
+ n
-
k
-
$
ln(n-k)
-
o + 0( (n-k))‘) .
(9.95)
We want to convert this to
#a
nice, simple 0 estimate.
The tail-exchange method allows us to work with estimates that are valid
only when k is in the “dominant” set D,. But how should we define D,?
9.6 FINAL SUMMATIONS 473
Actually I’m not
into dominance.
We have to make
D,
small enough that we can make a good estimate; for
example, we had better not let k get near n, or the term O((n
-
k)-‘)
in
(9.95) will blow up. Yet
D,
must be large enough that the tail terms (the
terms with k
@
Dn) are negligibly small compared with the overall sum. Trial
and error is usually necessary to find an appropriate set
D,;
in this problem
the calculations we are about to make will show that it’s wise to define things
as follows:
kED,
%
Ikl
<
n”‘+‘.
bg6)
Here
E
is a small positive constant that we can choose later, after we get to
know the territory. (Our 0 estimates will depend on the value of e.) Equation
(9.95) now reduces to
lnok(n) =
(2n+~)ln2-o-~lnn+O(n~‘)
-
(n+k+i)
ln(l+k/n)
-
(n--k+:)
141-k/n).
(9.97)
(We have pulled out the large parts of the logarithms, writing
ln(nfk)
=
lnnfln(1
&k/n),
and this has made a lot of Inn terms cancel out.)
Now we need to expand the terms ln(l
f
k/n) asymptotically, until we
have an error term that approaches zero as n -+ 00. We are multiplying
ln(
141
k/n) by (n
f
k+
i),
so we should expand the logarithm until we reach
o(n-‘),
using the assumption that
Ikl
6 n1/2+E:
In
l*k
=
*t-$+O(nP3/2+3C).
(
>
Multiplication by n
f
k +
i
yields
k2
k2
fk
-
2n
+
;
+
O(n-“2+3’)
,
plus other terms that are absorbed in the 0(n-‘/2+3e). So (9.97) becomes
lnok(n) =
(2n+f)ln2-o-~lnn-k2/n+O(nP”2+3’).
Taking exponentials, we have
ak(n)
=
kw8)
474 ASYMPTOTICS
This is our approximation, with
22n+l/2
bk(n) =
~
e
-k2/n
e”fi
ck(n)
=
22nnp1+3e
e-k2/n.
Notice that k enters bk(n) and ck(n) in a very simple way. We’re in luck,
because we will be summing over k.
The tail-exchange trick tells us that
tk
ok(n) will be approximately
tk
bk(n) if we have done a good job of estimation. Let us therefore evaluate
xbk(n)
=
g
I-
e-k*/n
k
k-
=
(Another stroke of luck: We get to use the sum
0,
from the previous exam-
ple.) This is encouraging, because we know that the original sum is actually
A, =
x
‘k”
(
)
= (1
f
1)2, = 22n.
k
Therefore it looks as if we will have
e“
=
6,
as advertised.
But there’s a catch: We still need to prove that our estimates are good
enough. So let’s look first at the error contributed by
C&(n):
L(n)
=
x
22nn~1+3ee~k2/n
<
22nn~1+3c@
\
n=
O(22”npt+3’).
~kl$n’/2+~
Good; this is asymptotically smaller than the previous sum, if
3~
<
i.
Next we must check the tails. We have
x
epk2/n
< exp(--Ln’/2+eJ2/n)
(1
+ e-‘/n + e-21” +
.
)
k>n’/ZiC
=
O(ep1L2F)
. O(n),
which is O(nPM) for all M;
XI
Eke,,,
bk(n) is asymptotically negligible. (We
chose the cutoff at
n’/2fe
just so that
eek21n
would be exponentially small
outside of D,. Other choices like n
‘/2
logn would have been good enough
too, and the resulting estimates would have been slightly sharper, but the
formulas would have come out more complicated. We need not make the
strongest possible estimates, since our main goal is to establish the value of
the constant o.) Similarly, the other tail
What an amazing
coincidence.
I’m tired of getting
to the end of long,
hard books and not
even getting a word
of good wishes from
the author.
It
would
be nice to read a
“thanks for reading
this, hope it comes
in handy,” instead
of just running into
a hard, cold, card-
board cover at the
end of a long, dry
proof You know?
9.6 FINAL SUMMATIONS 475
is bounded by 2n times its largest term, which occurs at the cutoff point k
z
nl/z+E. This term is known to be approximately
bk
(n), which is exponentially
small compared with A,; and an exponentially small multiplier wipes out the
factor of 2n.
Thus we have
;successfully
applied the tail-exchange trick to prove the
estimate
22n
zz
if 0 <
E
<
4.
(9.99)
Thanks for reading
We may choose
e
=
i
and conclude that
this, hope it comes
in handy.
-The authors
0 = tln27r.
QED.
Exercises
Warmups
1
Prove or disprove: If f 1 (n) 4
gl
(n) and
f2
(n) +
g2
(n), then we have
fl
(n)
+
f2(n)
-:
91
(n)
+
92(n).
2 Which function grows faster:
a n(inn)
or (Inn)n?
b
n(lnln’nn)
or (Inn)!?
c (n!)!
or I((n
-
l)!)! (n
-
l)!“!?
d
FfH,,
or
HF,?
3
What’s wrong with the following argument? “Since n = O(n) and 2n =
O(n) and so on, we have
XL=:=,
kn =
Et=,
O(n) =
O(n2).”
4
Give an example of a valid equation that has O-notation on the left but
not on the right. (Do not use the trick of multiplying by zero; that’s too
easy.) Hint: Consider taking limits.
5
Prove or disprove: O(f(n) + g(n)) = f(n) + O(g(n)), if f(n) and g(n)
are positive for all n. (Compare with
(g.zT).)
6
Multiply
(lnn+y+O(l/n))
by
(n+O(fi)),
and express your answer
in O-notation.
7
Estimate xkaO eek/” with absolute error O(n-’ ).
Basics
8
Give an example of functions f(n) and g(n) such that none of the three
relations f(n) 4 g(n), f(n) + g(n), f(n) x g(n) is valid, although f(n)
and g(n)
both.
increase monotonically to
03.
476
ASYMPTOTIC23
9 Prove (9.22) rigorously by showing that the left side is a subset of the
right side, according to the set-of-functions definition of 0.
10
Prove or disprove: cos O(x) = 1 +
0(x2)
for all real x.
11
Prove or disprove:
0(x-t
y)2 =
0(x2)
+ O(y2).
12 Prove that
1
+
f
+ O(np2) = (1 +
f)(l
+ O(nm2)),
asn-koc.
13 Evaluate (n + 2 +
O(n
I))”
with relative error O(n ‘).
14 Prove that (n + a)“+P q =
nn+pea(l +
ol(@
-- tcY)n-’ +
O(n2)).
15 Give an asymptotic form.ula for the “middle” trinomial coefficient (n3cn),
correct to relative error
O(n3).
3 ,
16
Show that if B(l
-x)
= -B(x) 3 0 for 0 < x <
i,
we have
s
b
B(H)
f(x)
dx
3 0
a
if we assume also that f”(x) 3 0 for a < x
c;
b.
17 Use generating functions to show that
B,(i)
= (2’-“’
-
l)B,,
for all
m 3 0.
18 Find
tk
(2r)(x with relative error O(np’/4), when a > 0.
Homework exercises
19
Use a computer to compare the left and right sides of the approximations
in Table 438, when n = 10,
z
=
a
= 0.1, and O(f(n)) =
O(f(z))
= 0.
20 Prove or disprove the following estimates, as n
+
00:
a
o(
(&J2)
=
O(lJ;;12)~
b
e(‘+oc’/n))z
c
e
+
(0(1/n).
c
n! = O(((1
-
l/n]nn)n).
21 Equation (9.48) gives the nth prime with relative error O(logn))2. Im-
prove the relative error to O(logn))3 by starting with another term of
(9.31) in (9.46).
22 Improve (9.54) to O(np3).
23 Push the approximation (9.62) further, getting absolute error O(n--3).
Hint: Let
g,,
=
c/(n
+ 1) (n + 2) +
h,;
what recurrence does h,, satisfy?
24
25
26
27
28
29
30
31
9 EXERCISES 477
Suppose a, =
O(f(n)) and b, =
O(f(n)).
Prove or disprove that the
convolution ~~==,
akb+k
is also
O(f(n)),
in the following cases:
a
f(n) = n-“:, a > 1.
b
f(n) = a-n,,
OL
> 1.
Prove
(9.1)
and
(9.2),
with which we opened this chapter.
Equation
(9.~1)
shows how to evaluate In
lo!
with an absolute error <
A.
Therefore if we take exponentials, we get
lo!
with a relative
error that is less than
e1/126000000
-
1
< 1
Op8.
(In fact, the approximation
gives 3628799.9714.) If we now round to the nearest integer, knowing that
lo! is an integer, we get an exact result.
Is it always possible to calculate n! in a similar way, if enough terms of
Stirling’s approximation are computed? Estimate the value of m that
gives the best approximation to Inn!, when n is a fixed (large) integer.
Compare the absolute error in this approximation with n! itself.
Use Euler’s summation formula to find the asymptotic value of Hipa) =
,E:=,
ka,
where a is any fixed real number. (Your answer may involve a
constant that you do not know in closed form.)
Exercise 5.13 defines the hyperfactorial function Q,, = 1
122
. . . nn. Find
the asymptotic value of Qn with relative error
O(n’).
(Your answer
may involve a constant that you do not know in closed form.)
Estimate the function 1”’ 2’i2 . . .
n””
as in the previous exercise.
Find the asymptotic value of
&O
k’epkL”’
with absolute error
O(n3),
when
1
is a fixed nonnegative integer.
Evaluate
tk20
l/(ck+cm)
with absolute error O(C-~~), when c > 1 and
m is a positive iinteger.
Exam problems
32 Evaluate
eHn+Hiz’
with absolute error
O(n-‘).
33 Evaluate
tkao
(;)/n’
with absolute error O(np3).
34 Determine values A through F such that (1 + 1 /n)“Hn is
E(lnn)’
Flnn
An+B(lnn)2+Clnn+D+I+-+O(n-‘).
35 Evaluate
I:=,
1 /kHk with absolute error 0( 1).
36 Evaluate
S,
=
xF=,
l/(n’
+ k2) with absolute error
O(n5).
37 Evaluate
IF=,
ln
mod k) with absolute error O(nlogn).
38 Evaluate tkaO
kk
(i)
with relative error 0
(n-’
) .
478 ASYMPTOTICS
39 Evaluate xOsk<,,
ln(n
-- k)(lnn)k/k! with absolute error
O(n-‘).
Hint:
Show that the terms for k 3
lOInn
are negligible.
40 Let m be a (fixed) positive integer. Evaluate
&
(-l)kHp
with abso-
lute error O(1).
41 Evaluate the “Fibonacci factorial”
nt=,
Fk
with relative error
O(n-‘)
or better. Your answer may involve a constant whose value you do not
know in closed form.
42 Let
01
be a constant in the range 0 <
o(
<
i.
We’ve seen in previous
chapters that there is no general closed form for the sum
tksan
(t).
Show that there is, however, an asymptotic formula
whereH(a)=algi+(l--)lg(&).
Hint: Showthat (kn,)
<e(E)
for 0 < k <
OLTI.
43 Show that
C,,
the number of ways to change n cents (as considered in
Chapter 7) is asymptotically
cn4
+ O(n3) for some constant c. What is
that constant?
44 Prove that
as x
+
0;).
(Recall the definition
xz
=
x!/(x
-
i)!
in (5.88), and the
definition of generalized Stirling numbers in Table 258.)
45 Let a be an irrational number between 0 and 1. Chapter 3 discusses the
quantity D(
01,
n), which measures the maximum discrepancy by which
the fractional parts
{kOL)
for 0 6 k < n deviate from a uniform distribu-
tion. The recurrence
D(a,n) 6 D({PI-‘},
[an])
+ (x-’ + 2
was proved in (3.31); we also have the obvious bounds
0 6
D(cc,n)
6 n.
Prove that
lim,,,
D(
OL,
n)/n
= 0. Hint: Chapter 6 discusses continued
fractions.
46 Show that the Bell number
b,
= ee’ &>O
kn/k!
of exercise 7.15 is
asymptotically equal to
where m(n)
In
m(n) = n
-
i,
and estimate the relative error in this
approximation.
47 Let m be an integer 3 2. Analyze the two sums
n
El
log,
4
k=l
and
~Ilog,nl
;
which is asymptotically closer to log,,, n! ?
48 Consider a table of the harmonic numbers Hk for 1 < k 6 n in decimal
notation. The kth entry
fi(k
has been correctly rounded to
dk
significant
digits, where
dk
is just large enough to distinguish this value from the
values of
l-lk-1
and
Hk+l
. For example, here is an extract from the table,
showing five entries where
Hk
passes 10:
k
h
12364 9.99980041-
12365 9.99988128+
12366 9.99996215-
12367 10.00004301-
12368 10.00012386+
Estimate the total number of digits in the table, xc=,
dk,
with an abso-
lute error of 0 (n).
49
50
In Chapter 6 we considered the tale of a worm that reaches the end of a
stretching band after n seconds, where H,-1 < 100 6 H,. Prove that if
n is a positive integer such that
H,-l
6
016
H,, then
Venture capitalists in Silicon Valley are being offered a deal giving them
a chance for an exponential payoff on their investments: For an n mil-
lion dollar investment, where n 3 2, the GKP consortium promises to
pay up to N million dollars after one year, where N =
10n.
Of course
there’s some risk; the actual deal is that GKP pays k million dollars with
probability
l/
(k’H$‘), for each integer k in the range 1 6 k < N. (All
payments are in megabucks, that is, in exact multiples of
$l,OOO,OOO;
the
payoff is determined by a truly random process.) Notice that an investor
always gets at least a million dollars back.
9 EXERCISES 479
480 ASYMPTOTICS
a
What is the asymptotic expected return after one year, if n million
dollars are invested? (In other words, what is the mean value of the
payment?) Your answer should be correct within an absolute error
I once earned
of O(10-n) dollars.
O(lO-") dollars.
b
What is the asymptotic probability that you make a profit, if you
invest n million? (In other words, what is the chance that you get
back more than you put in?) Your answer here should be correct
within an absolute error of
O(np3).
Bonus problems
51
Prove or disprove: j,”
O(xp2)
dx =
O(n’)
as n
-+
co.
52
Show that there exists a power series A(z) = xk>c
anon,
convergent for
all complex
z,
such that
.n
A(n) +
nn”
>
n,
53
Prove that if f(x) is a function whose derivatives satisfy
f’(x)
6 0,
-f"(x)
6
0, fU'(X)
6
0,
.
..)
(-l)"f'"+')(x)
<
0
for all x 3 0, then we have
f(x) = f(O)+
yx+...+
:'""~~/xrnel + O(xm),
m .
for x
3
0.
In particular, the case f(x) =
-ln(l
+ x) proves
(9.64)
for all k,n > 0.
54 Let
f(x)
be a positive, differentiable function such that
xf'(x)
+
f(x) as
x
+
00.
Prove that
Ix
f(k)
-
k3n
k’+‘x
if
01
> 0.
Hint: Consider the quantity f(k
-
l)/(k
-
i)”
-
f(k +
i)/(k
+
i)“.
55 Improve
(9.99)
to relative error
O(n-3/2+5f).
56 The quantity Q(n) =
1 + $i$
+
ey
+. . . =
&,
nk/nk
occurs in the
analysis of many algorithms. Find its asymptotic value, with absolute
error o(l).
5’7
An asymptotic formula for Golomb’s sum
xka,
1
/kll
+log,
k]’
is derived
in
(9.54).
Find an asymptotic formula for the analogous sum without
floor brackets,
xk2,
l/k(
1 +log,
k)2.
Hint: We have j,” uee”ketu du =
l/(1
+tlnk)2.
9 EXERCISES 481
58 Prove that
B,(Ix})
=
-26
x
cos(2xk;m-
inrn)
,
kal
for m 3 2,
by using residue calculus, integrating
1
-f
2ni
$nize
dz
2ni ---
e2niz
_
1
=rn
on the square contour
z
=
xfiy,
where max(lxl,
IyI)
=
M+i,
and letting
the integer M tend to 00.
59 Let
o,(t)
=
tk
e
mik+t)‘/n,
a periodic function of t. Show that the
expansion of O,,(t) as a Fourier series is
o,(t)
= &Ei(l + 2eCnJn(cos27rt) + 2e 4XLn(cos4xt)
+2e~9X’n(cos6xt)+...).
(This formula gives a rapidly convergent series for the sum
0,
=
0,
(0)
in equation (g.g3).)
60 Explain why the coefficients in the asymptotic expansion
(?)
=
-&(I-&+j&+&-j&g++
all have denominators that are powers of 2.
61 Exercise 45 proves that the discrepancy D(
01,
n) is o(n) for all irrational
numbers
01.
Exhibit an irrational
01
such that D
(01,
n) is not 0
(n’
) for
any
c
> 0.
62
Given n, let
{
,;‘,)}
=
ma& {E} be the largest entry in row n of Stirling's
subset triangle. Show that for all sufficiently large n, we have m(n) =
1777(n)]
or m(n) =
[m(n)],
where
m(n)(%n)
+ 2) In@(n) + 2) = n(K(n) + 1)
Hint: This is difficult.
63 Prove that S.W. Golomb’s self-describing sequence of exercise 2.36 sat-
isfies f(n) =
@2m+n@m1
+
O(n+-‘/logn).
64 Find a proof of the identity
cos 2n7tx
x7-=
7T2
(x2 -x+ ;) for 0 6 x 6 1,
that uses only “Eulerian” (eighteenth-century) mathematics.
.
482 ASYMPTOTIC3
Research problems
65
66
67
Find a “combinatorial” proof of Stirling’s approximation. (Note that nn
is the number of mappings of
{
1,2, . . . ,
n} into itself, and n! is the number
of mappings of {1
,2,.
. . , n} onto itself.)
Consider an n x n array of dots, n 3 3, in which each dot has four
neighbors. (At the edges we
“wrap around” modulo n.) Let
xn
be the
number of ways to assign the colors red, white, and blue to these dots in
such a way that no neighboring dots have the same color. (Thus x3 = 12.)
Prove that
Let Q,, be the least integer m such that
H,
> n. Find the smallest
integer n such that Q,, #
[enmy
+
;I,
or prove that no such n exist.
Th-t/Ah-that’s
all,
folks!
A
Answers to Exercises
(The first tinder of
every error in this
book will receive
a reward of $2.56.)
Does that mean
I have to find every
error?
(We meant to say
“any error.“)
Does that mean
only one person gets
a reward?
(Hmmm. Try it and
see.)
The number of
intersection points
turns out to give
the who/e story;
convexity was a red
herring.
EVERY EXERCISE is answered here (at least briefly), and some of these
answers go beyond what was asked. Readers will learn best if they make a
serious attempt to find their own answers BEFORE PEEKING at this appendix.
The authors will be interested to learn of any solutions (or partial
solutions) to the research problems, or of any simpler (or more correct) ways
to solve the non-research ones.
1.1
The proof is fine except when n = 2. If all sets of two horses have
horses of the same ‘color, the statement is true for any number of horses.
1.2
If X, is the number of moves, we have
X0
= 0 and X, = X, 1 + 1 +
X,-l
+ 1 +
X,-l
when n > 0. It follows (for example by adding 1 to both
sides) that X, = 3”
-
1. (After
ix,,
moves, it turns out that the entire tower
will be on the middle peg, halfway home!)
1.3
There are 3c possible arrangements, since each disk can be on any of
the pegs. We must ‘hit them all, since the shortest solution takes 3”
-
1 moves.
(This construction
:LS
equivalent to a “ternary Gray code,” which runs through
all numbers from (0. . .
0)3
to (2. .2)3, changing only one digit at a time.)
No. If the largest disk doesn’t have to move, 2”
-
1 moves will suffice
ti:
induction); otherwise
(2nP’
-
1) + 1 +
12nP’
-
1) will suffice (again by
induction).
1.5
No; different circles can intersect in at most two points, so the fourth
circle can increase the number of regions to at most 14. However, it is possible
to do the job with ovals:
483
484 ANSWERS TO EXERCISES
Venn
[294]
claimed that there is no way to do the five-set case with ellipses,
but a five-set construction with ellipses was found by Griinbaum
[137].
1.6
If the nth line intersects the previous lines in k > 0 distinct points, we
This answer as-
get k- 1 new bounded regions (assuming that none of the previous lines were
SumeS
that
n
>
0.
mutually parallel) and two new infinite regions. Hence the maximum number
of bounded regions is
(n-2)+(n-3)+..
. =
S-1
=
(n-l)(n-2)/2
=
L,-2n.
1.7
The basis is unproved; and in fact, H(1) # 2.
1.8
Qz
=
(1
+
B)/a;
43
= (1
+
LX+
B)/ct/3;
44
= (1 + x)/B;
Qs
=
cx;
46 =
b.
So the sequence is periodic!
1.9
(a) We get P(n
-
1) from the inequality
x1
. . . xn-1
(
x1
+ . . . +
X,-l
n-l
>-(
x1
+.
. .
$X,-l
n
<
n-l
>
.
(b)
XI
. . .x,x,+1 . ..xz,, 6
(((XI
+ ... + xnl/n)((xn+l + ... + x2nl/n))n by
P(n); the product inside is 6
((x1
+...+xzn)/2n)’
by P(2). (c) For example,
P(5) follows from P(6) from P(3) from P(4) from P(2).
1.10 First show that R, = R,-l + 1 +
Q+l
+ 1 +
R+l,
when n > 0.
Incidentally, the methods of Chapter 7 will tell us that
Q,,
= ((1 +
v’?)~+’
-
(1
-
fi)““)/(2fi)
-
1.
1.11
(a) We cannot do better than to move a double (n
-
1)-tower, then
move (and invert the order of) the two largest disks, then move the double
(n
-
1)-tower again; hence A, = 2Anpl + 2 and A,, =
2T,,
=
2n+1
-
2. This
solution interchanges the two largest disks but returns the other 2n
-
2 to
their original order.
(b) Let
B,
be the minimum number of moves. Then
B1
= 3, and it can
be shown that no strategy does better than
B,
= A,-1 + 2 +
A,_1
+ 2 + B,-1
when n >
1.
Hence
B,
=
2n+2
-5, for all n > 0. Curiously this is just
2A,-1,
and we also have
B,
= A,-1 + 1 + A,-1 + 1 + A,-1 + 1 +
A,-l.
1.12
Ifallmk
>O, thenA(mi
,...,
m,)
=2A(ml,.,,,
m,-1)$-m,. Thisis
an equation of the “generalized Josephus” type, with solution
(ml
, , .
m,,)2
=
2n-1 ml + . .
.+2m,-1
+m,.
Incidentally, the corresponding generalization of exercise llb appears
to satisfy the recurrence
A(ml,
.
.
.
,
md,
if m, = 1;
B(ml,...,m,)
=
2m,-1,
ifn=l;
2A(ml,.
. .
,
m,-1)
+
2mn
+B(m,...,m,-l),
ifn>l
andm,>l.
A ANSWERS TO EXERCISES 485
1.13 Given n straight lines that define
L,
regions, we can replace them
by extremely narrow zig-zags with segments sufficiently long that there are
nine intersections between each pair of zig-zags. This shows that ZZ, =
ZZ, ’
+9n-8,
for
a’11
n > 0; consequently ZZ, = 9S,
-8n+
1 = ;n2
-
In+
1.
1.14 The number
Iof
new 3-dimensional regions defined by each new cut is
the number of 2-dimensional regions defined in the new plane by its intersec-
tions with the previous planes. Hence
P,
=
P,
’ +
L,
~1,
and it turns out
that
P5
= 26. (Six cuts in a cubical piece of cheese can make 27 cubelets, or
up to
P6
= 42 cuts of weirder shapes.)
Incidentally, the solution to this recurrence fits into a nice pattern if
we express it in terms of binomial coefficients (see Chapter 5):
x,
=
(;)i-(1;);
L,,
=
(;)i-(;)-(1);
pn
=
(‘3)+-(;)-(1)+(Y)
I bet
Iknowwhat
happensin four
Here X, is the maximum number of l-dimensional regions definable by n
dimensions!
points on a line.
1.15
The function I satisfies the same recurrence as J when n >
1,
but I( 1)
is undefined. Since I(2) = 2 and I( 3)
=I
1, there’s no value of I ( 1) =
OL
that
will allow us to use our general method; the “end game” of unfolding depends
on the two leading bits in n’s binary representation.
If n =
2”
+
2mp1
+k,whereO~k<2m+‘+2m-(2m+2m
‘)=
2”’
+2”’ ‘, the solution is I(n) =
2k+
1 for all n > 2. Another way to express
this, in terms of the representation n =
2”
+ 1, is to say that
I(n) =
{
J(n)
+
2
ml
,
ifO<1<2’“-‘;
J(n)
-
lrn,
if2”
<1<2”.
1.16 Let g(n) =
a(n)ol+
b(n)J3o + c(n)J3’ + d(n)y. We know from (1.18)
that
a(n)x
+
Wn)h
+
c(n)61
=
(1x&,,
, fib,,,
L
.
(31,~
i&)3
when
n
=
(1
b
m
’ . . b’
bo)2;
this defines a(n), b(n), and c(n). Setting g(n) = n in
the recurrence implies that a(n) + c(n)
-
d(n) = n; hence we know every-
thing. [Setting g(n) = 1 gives the additional identity
a(n)-2b(n)-2c(n)
= 1,
which can be used
tlo
define b(n) in terms of the simpler functions a(n) and
a(n) + c(n).]
1.17
In general we have
W,,,
< 2W,,,
k
+
Tk,
for 0 < k < m. (This relation
corresponds to transferring the top n
-
k, then using only three pegs to
486 ANSWERS TO EXERCISES
move the bottom k, then finishing with the top n
-
k.) The stated relation
turns out to be based on the unique value of k that minimizes the right-
hand side of this general inequality, when
n
= n(n +
1)/2.
(However, we
cannot conclude that equality holds; many other strategies for transferring
the tower are conceivable.) If we set
Y,,
=
(W,,(n+ll,Z
-
1)/2n,
we find that
Y,,
6
Y,-1
+
1;
hence W,(,+1),2
<
2n(n-
1) + 1.
1.18
It suffices to show that both of the lines from (n2j,0) intersect both of
the lines from (n
2k
0) and that all these intersection points are distinct.
, ,
A line from
(xi,
0) through
(xi
-
oj, 1) intersects a line from
(xk,
0)
through (Xk
-
ok,
1) at the
point
(Xj
-
toj,t)
where t =
(xk
-
Xj)/(CIk
-
oj).
Let
Xj
=
n2j
and
oj
= nj + (0 or nP”). Then the ratio t =
(nZk
-
n2j)/
(nk
-
nj
+
(
-nPn
or 0 or
nPn
))
lies strictly between
nj+nk-1
and
nj+nk+l;
hence the y coordinate of the intersection point uniquely identifies j and k.
Also the four intersections that have the same j and k are distinct.
1.19 Not when n >
11.
A bent line whose half-lines run at angles
8
and
8
+ 30” from its apex can intersect four times with another whose half-lines
run at angles 4 and
@
+ 30” only if
10
-
+I
> 30”. We can’t choose more
than 11 angles this far apart from each other. (Is it possible to choose
ll?)
1.20
Let h(n) = a(n)o1+ b(n)l& +
c(n)01
+ d(n)yc + e(n)y,. We know
from
(1.18)
that
a(nb+b(n)Bo+c(n)Bl
=
(aPb,,-,
fib,,-2
. . . obl pbo)4 when
n = (1
b,,-l
. .
.
bl
bo)z;
this defines a(n), b(n), and c(n). Setting h(n) = n in
the recurrence implies that
a(n)+c(n)-2d(n)-2e(n)
= n; setting h(n) =
n2
implies that a(n) + c(n) + 4e(n) =
n2.
Hence d(n) = (3a(n) +
3c(n)
-
n2
-
2n)/4; e(n) = (
n2
-
a(n)
-
c(n))/4.
1.21
We can let m be the least (or any) common multiple of
2n,
2n
-
1,
. * *
)
n + 1. [A non-rigorous argument suggests that a “random” value of m
will succeed with probability
n n-l
1
2n
&ii
z---&-i..“‘-
=
n+l
1
A
)
n
-7)
so we might expect to find such an m less than
4n.]
1.22 Take a regular polygon with
2n
sides and label the sides with the
Ioncerode a
elements of a “de Bruijn cycle” of length 2”. (This is a cyclic sequence of
de
Brudl]
,cyc’e
O’s and l’s in which all n-tuples of adjacent elements are different; see
[173,
exercise
2.3.4.2-231
and
[174,
exercise
3.2.2-171.)
Attach a very thin convex
~Sh~~~~~&~~e,,
The Netherlands).
extension to each side that’s labeled 1. The n sets are copies of the resulting
polygon, rotated by the length of k sides for k = 0,
1,
. . . , n
-
1.
1.23 Yes. (We need principles of elementary number theory from Chap-
ter 4.) Let L(n) = lcm(l,2,. , .
, n). We can assume that n > 2; hence by
A ANSWERS TO EXERCISES 487
Bertrand’s postulate there is a prime p between n/2 and n. We can also
assume that j >
n/2,
since q
= L(n) + 1
-
q leaves j
= n + 1
-
j if and
only if q leaves j. Choose q so that q E 1 (mod L(n)/p) and q
E
j + 1
-
n
(mod p). The people are now removed in order 1, 2, . . . , n
-
p, j + 1, j + 2,
. . . ,
n,n-p+l, . . . . j-l.
1.24 The only known examples are: X, = a/Xnplr which has period 2;
R. C. Lyness’s recurrence of period 5 in exercise 8; H. Todd’s recurrence
X, = (1 + X, 1 + X,-2)/Xnp3, which has period 8; and recurrences derived
from these by subst.itutions of the form
Y,,
=
ax,,,,.
An exhaustive search
by Bill Gosper turned up no nontrivial solutions of period 4 when k = 2.
A partial theory has been developed by Lyness
[210]
and by Kurshan and
Gopinath
[189].
An interesting example of another type, with period 9 when
the starting values are real, is the recurrence X, = /X,-l I
-
X+2
discovered
by Morton Brown
[38].
Nonlinear recurrences having any desired period 3 5
can be based on continuants
[55].
1.25 If Tckl(n) denotes the minimum number of moves needed to transfer n
disks with k auxiliary pegs (hence
T('
)
(n) =
T,
and
TIZi
(n) = Wn), we have
T'ki((n;'))
<
2T'ki((;))+T'kP'1((k:,)).
No examples (n, k) are known where
this inequality fails to be an equality. When k is small compared with n, the
formula 2”+‘mmk(;:;) gives a convenient (but non-optimum) upper bound on
T’k’((;)).
1.26 The execution-order permutation can be computed in O(n log n) steps
for all m and n
[175,
exercises 5.1.1-2 and 5.1.1-51. Bjorn Poonen
[241]
has
proved that non-Josephus sets with exactly four “bad guys” exist whenever
n E 0 (mod 3) and n 3 9; in fact, the number of such sets is at least
e(i)
for some
E
> 0. He also found by extensive computations that the only other
n < 24 with non-Josephus sets is n = 20, which has 236 such sets with k = 14
and two with k =
13’.
(One of the latter is
{1,2,3,4,5,6,7,8,11,14,15,16,17};
the other is its reflection with respect to 21.) There is a unique non-Josephus
set with n = 15 and k = 9, namely
{3,4,5,6,8,10,11,12,13}.
2.1
There’s no agreement about this; three answers are defensible: (1) We
can say that
EL=,,
qk
is always equivalent to trnGkc,,
ok;
then the stated
sum is zero. (2) A person might say that the given sum is q4+q3+ql+ql
+qo,
by summing over decreasing values of k. But this conflicts with the generally
accepted convention that
Et=,
qk
= 0 when n = 0. (3) We can say that
x:z,,,
qk
=
tk+,
qk
-
tk<,,,
qk;
then the stated sum is
-41
-
q2
-
43.
This
convention may appear strange, but it obeys the useful law
xi,,
+
Et=,+,
=
EL=,
for all a, b, c.
It’s best to
u;se
the notation
xi=,
only when n
-
m 3 -1; then both
conventions (1) and (3) agree.
488 ANSWERS TO EXERCISES
2.2 This is
lx].
Incidentally, the quantity ([x > 0]
-
[x <
01)
is often called
sign(x) or Signum(x); it is +1 when x > 0, 0 when x = 0, and -1 when x < 0.
2.3 The first sum is, of course,
a0
+ al +
a2
+
a3
+
a4
+
as;
the second is
a4+al+ao+al+a4,becausethesumisoverthevalueskEj-2,-1,0,+1,+2}.
The commutative law doesn’t hold here because the function p(k) =
k2
is not
a permutation. Some values of n (e.g., n = 3) have no k such that p(k) = n;
others (e.g., n = 4) have two such k.
2.4
(a)
E:=I
E;=i+J
Et=j+l
aijk
=
IE:f=j
xfzi+,
~“,++I
aijk
=
((al23
+
a1241
+
a134)
+
a234.
(b)
x”,==,
x:i,’
x/l;
aijk
=
x”,=3
x?i
~~~~
aijk
=
Cl123
-k
(Cl124
+
(a134
+
a234)).
2.5
The same index ‘k’ is being used for two different index variables, al-
though k is bound in the inner sum. This is a famous mistake in mathematics
(and computer programming). The result turns out to be correct if
oj
=
ok
foralljandk,
l<j,k<n.
2.6 It’s
(1
6 j < nl (n
-
j + 1). The first factor is necessary here because we
should get zero when j < 1 or j > n.
2.7 mx”’ -‘. A version of finite calculus based on V instead of A would
therefore give special prominence to rising factorial powers.
2.8 0, if m 3 1;
l/lm(!,
if m 6 0.
2.9 x”‘+” =
xTfi
(x + m)“, for integers m and n. Setting m = -n tells us
thatx?=l/(x-n)“=l/(x-1)n.
2.10 Another possible right-hand side is Eu Av + v Au.
2.11 Break the left-hand side into two sums, and change k to k + 1 in the
second of these.
2.12 If p(k) = n then n + c = k+
((-l)k
+ 1)c and
((-l)k
+ 1) is even;
hence
(-l)n+c
=
(-l)k
and k = n-
(-l)“+,c.
Conversely, this value of k
yields p(k) = n.
2.13 Let
Ro
=
(x,
and R, = R,_, +
(-l)“(fi
+ny
+n26)
for n > 0. Then
R(n) = A(n)ol+ B(n)0 +
C(n)y
+ D(n)S. Setting
R,,
= 1 yields A(n) = 1.
Setting R, =
(-1)”
yields A(n)
+2B(n)
=
(-l)n.
Setting R, =
(-1)“n
yields
-B(n)+ZC(n)
=
(-1)“n.
Setting R, =
(-l)“n2
yields
B(n)-ZC(n)+
2D(n) =
(-l)“n2.
Therefore 2D(n) =
(-l)“(n2+n);
the stated sum is D(n).
2.14 The suggested rewrite is legitimate since we have k =
~,~i~k
1 when
1 < k < n. Sum first on k; the multiple sum reduces to
t
(2n+’
-
2j)
=
nln+’
-
(2”+1
-
2)
.
l<j<n
“lt is a profoundly
erroneous truism,
repeated by all
copybooks and by
eminent people
when they are
making speeches,
that we should
cultivate the habit
of thinking of what
we are doing. The
precise opposite is
the case. Civiliza-
tion advances by
extending the num-
ber of important
operations which
we can perform
without thinking
about them. Opera-
tions of thought are
like cavalry charges
in a battle-they
are strictly limited
in number, they
require fresh horses,
and must only be
made at decisive
moments.
-A. N.
White-
head
/3&Z]
A ANSWERS TO EXERCISES 489
2.15 The first step replaces k(k + 1) by 2
t,5isk
j. The second step gives
GO,,
+
q
,, =
(xc=,
k)’
+
0,.
2.16
x-(x
-
m)” q =
X=
=
x2(x
-
n:im, by (2.52).
2.17 Use induction. for the first two
=‘s,
and (2.52) for the third. The second
line follows from the first.
2.18 Use the facts that (%z)+ 6
lz/,
(%z)) <
/z/,
(?z)+ 6
/zl,
(32))
<
1~1,
and
Iz/
6 (Rz)+ +
(93~)~
+ (3~)~ + (3~)~.
2.19 Multiply both sides by
2”-l/n!
and let
S,
=
2”T,/n!=
S,_ 1
+3.2n-’
=
3(2”
-
1) +
SO.
The solution is T, =
3.
n! + n!/2nP’. (We’ll see in Chapter 4
that
T,,
is an integer only when n is 0 or a power of 2.)
2.20 The perturbation method gives
Sn +
(n
+
1
)&+I
=
%+
(o&nHkl
+n+l.
2.21 Extracting the final term of &+I gives &+I = 1
-
S,;
extracting the
first term gives
S
n+l
zr
(-qn+'
+
x
(-y-k
=
(-l)n+'
+
x
(-I)"-k
l<k<n+l
Obkbn
=
(-l)n+’
+
s,
.
Hence 2S, = 1 + (-‘I )” and we have
S,
= In is even]. Similarly, we find
T
n+l
=
n+‘I-T,
=
$(-l)“k(k+l)
=
T,+S,,
k=O
hence
2Tn
= n + 1 --
S,
and we have T, =
i
(n + [n is odd]). Finally, the
same approach yields
U
n+l
=
(n$-l)‘-U,
=
Un+2Tn+Sn
=
U,
+ n + [n is odd] + [n is even]
=
U,+n+l.
Hence
U,
is the triangular number
i
(n + 1 )n.
2.22 Twice the sum gives a “vanilla” sum over 1 < j, k 6 n, which splits
into three sums that can be handled easily.
2.23 (a) This approach gives four sums that evaluate to 2n +
H,
-
2n +
(H,
+
&
-
1). (It would have been easier to replace the summand by
l/k+l/(k+l).)
(b:)
Let u(x)
=2x+1
and Av(x) =
l/x(x+1)
=(x-1)2;
then Au(x) = 2 and v(x) =
-(x
-
l)-’
:=
-l/x. The answer is
2H,
-
+.
490 ANSWERS TO EXERCISES
2.24 Summing by parts,
t
xmHx 6x =
~lnilH,/(m+l)-xm+l/(~+1)2+~;
hence
&k<,,knHk
=nM(H,-l/(m+l))/(m+l)
+Oe/(m+1)2.
In
our case m = -2, so the sum comes to 1
-
(H,
+
l)/(n+
1).
2.25
Here are some of the basic analogies:
t)
na;
=
&K &K
kc K
~iak+bk!
=
t
ak
+
tbk
H
kEK
(~akJ(~bk)
n
okbk
=
kEK &:K
kCK
x
ak
=
x
ap(kl
H
n
ak
=
n
apikl
kEK
PiklEK
kEK
PiklEK
t
ak
=
~ak[kEK]
H
nak
=
,a,EK’
kEK
k
kEK
1 = #K H
nc=c#’
2’26
=
(nl<j,k<n
ajak)
(nl<j=k<n
ojok). The first factor is
(nz=,
a:)2;
the second factor is
nt=,
ai.
Hence P = (nc=, ak)n+'.
2.27
A
= C~(C
-
x
-
1) =
c*/(c
-
x). Setting c = -2 and decreasing
x by
2
yields
A(-(-2)--2)
=
(-2)“/x,
hence the stated sum is
(-2)L
-
(-2)n-‘=
(-l)“n!
-
1.
2.28 The interchange of summation between the second and third lines is
not justifiable; the terms of this sum do not converge absolutely. Everything
else is perfectly correct, except that the result of
tka,
[k = j
-
11
k/j should
As opposed to
perhaps have been written
[j
-
1 >
l](j
-
1)/j and simplified explicitly.
imperfectly correct.
2.29
Use partial fractions to get
k
1
-=-
-
-
4k2
-
1
+2k-1
The (-l)k factor now makes the two halves of each term cancel with their
neighbors. Hence the answer is
-l/4+
(-1)“/(8n+4).
2.30
tixdx=i(bL-aL)=i(b-a)(b+a~-1).
Sowehave
(b-a)(b+a-1) = 2100 =
22.3.52.7.
A ANSWERS TO EXERCISES
191
There is one solution for each way to write 2100 = x ‘y where x is even and
y is odd; we let a
==
:1x-yI
+
i
and b =
i(x+y)
+
i.
So the number of
solutions is the number of divisors of 3.
52
.7, namely 12. In general, there
are
n,,2(n,
+ 1) ways to represent
n,,
pn”, where the products range over
primes.
2.31
tj,k>2j
k
=
tja21/j2(l
-l/j)
:=tj,21/j(j-l).
Thesecondsumis,
similarly, 3/4.
2.32
If2n~x~2~n+l,thesumsareO+~~~+n+(x-n-l)+~~~+(x-2n)
=
n(x-n) = (x-l) + (x-3) +
...
+
(x-2n+l).
If 2n
-
1 < x < 2n they are,
similarly, both
equa:i
to n(x
-
n). (Looking ahead to Chapter 3, the formula
Li(x
+
l)]
(x
-
[i(x
+
l)])
covers both cases.)
2.33 If K is empty,
AkEK
ak
=
M.
The basic analogies are:
kEK
H
/j
(C
-t
ak)
=
C
+
A
ak
kCK
kEK
~!ak+bk!
=
t%+tbk
H
A
min(ak,bk)
ktK
kcK
kEK kEK
=
min(r\
ak,
A
bk)
kCK
kEK
kcK
t
ak
=
PikICK
>,
apikl
jEJ
jEJ
kEK
kEK
t
ak
=
ta,[kEK]
kEK
k
&K
H
A
ak
=
//
aplki
PIUEK
H
A
ai.k =
A
//
aj.k
iEl
ieJ
kEK
kEK
H
A
ak
=
Aak.aslkeKi
kEK
k
A
permutation that
2.34 Let
K+
= {k 1
ok
3 0} and K = {k 1
ak
<
0).
Then if, for example, n
consumes terms of
one sign faster than
is odd, we choose
F,
to be
F,
-I U
E,,
where
E,
g
KP
is sufficiently large that
those of the other
1
kc(F,
,nK*
)
ak
-
t-k@,,
tpak)
<
A
can steer the sum
toward any value
that it l&s.
2.35 Goldbach’s sum can be shown to equal
as follows: By unsumming a geometric series, it equals xkEP,La, k ‘; there-
fore the proof will be complete if we can find a one-to-one correspondence
between ordered pairs (m, n) with
m,n
3 2 and ordered pairs (k,
1)
with
k
E
P and
1
3
1,
where
m”
= k’ when the pairs correspond. If m 4 P we let
(m,n)
H (m”, 1); but if m =
ah
E
P, we let
(m,n)
H (an,b).
492 ANSWERS TO EXERCISES
2.36 (a) By definition, g(n)
-
g(n
-
1)
= f(n). (b) By part (a), g(g(n))
-
With thisself-
g(g(n-
1)) =
tkf(k)[g(n-l)<k~g(n)]
=n(g(n)
-g(n- 1)) =nf(n).
(cl BY part
(
)
g
~~~~~~e-
a
a
ain,
s(s(s(n)))
-
s(g(g(n-
1))) is
quence
wouldn’t
tf(k)[g(g(n-l))<k6g(g(n))]
do too well on the
Dating Game.
k
=
xi
[i=f(k)][g(g(n-l))<k~,g(g(n))]
=
cj
[i=f(k)][g(n-l)<j~g(n)]
i,k
=
~j(g(i)-s(i-l))[gin-l)~i~s(n)]
=
xif(i)
[g(n-l)<i<s(n)]
=
nxj
[s(n-l)<i<g(n)].
i
i
Colin Mallows observes that the sequence can also be defined by the recurrence
f(1) = 1; f(n+ 1) = 1 + f(n+ 1
--f(f(n)))
, for n 3 0.
2.37 (RLG thinks they probably won’t fit; DEK thinks they probably will;
OP is not committing himself.)
3.1
m=
1lgnJ;
L=n-2m=n-211gnl.
3.2 (a)
lx+
.5J. (b) TX-
.51.
3.3 This is Lmn
-
{mcx}n/aj = mn
-
1,
since 0 < {ma} < 1.
3.4 Something where no proof is required, only a lucky guess (I guess)
We have
[nxl
=
nlxl
H
nlxl
<
lnxj
< n
LxJ
+ 1
w
n
1x1
6
:k?<
nLxJ + 1
H
nx
-
n{x} < nx < nx
-
n{x} + 1, by (3.5(a)), (3.7(a)),
(3.7(d)), and (3.8); and this is equivalent to n{x} < 1, when n is a positive
integer. (Notice that
n[xl
6 1nxJ for all x in this case.)
3.6
lf(x,J
=
lf(Txl)J.
3.7
[n/ml
+ n mod m.
3.8 If all boxes contain <
[n/ml
objects, then n 6
([n/ml
-
l)m,
so
n/m + 1 <
[n/ml,
contradicting (3.5). The other proof is similar.
3.9 We have m/n-l/q = (n mumble m)/qn. The process must terminate,
because 0 6 n mumble m < m. The denominators of the representation are
strictly increasing, hence distinct, because
qn/(n
mumble m) > q.
3.10 [x +
il
-
[(2x + 1)/4 is not an integer] is the nearest integer to x, if
{x} #
i;
otherwise it’s the nearest even integer. (See exercise 2.) Thus the
formula gives an “unbiased” way to round.
A ANSWERS TO EXERCISES
4%
3.11 If n is an integer,
01
< n <
fi
w
[a] < n < [p]. The number of
integers satisfying a < n < b when a and b are integers is (b
-
a
-
1) (b > a).
We would therefore get the wrong answer if 01=
(3
= integer.
3.12
Subtract
[n/m]
from both sides, by (3.6), getting
[(n
mod
m)/m]
=
[(n
mod m + m
-
1
)/ml.
Both sides are now equal to [n mod m >
01,
since
O<nmodm<m.
A shorter but less direct proof simply observes that the first term in
(3.24) must equal the last term in (3.25).
3.13 If they form a partition, the text’s formula for N(cx, n) implies that
1
/a
+ l/(3 =
1,
because the coefficients of n in the equation N(ol, n) +
N (6, n) = n must agree if the equation is to hold for large n. Hence
(x
and
fi
are both rational or both irrational. If both are irrational, we do get a
partition, as shown in the text. If both can be written with numerator m, the
value m- 1 occurs in neither spectrum. (However, Golomb
[121]
has observed
that the sets
{
LnoLJ
1n 3 l} and
{
[n(3]
-- 1 1 n 3
1)
always do form a partition,
when
l/a+
l/b
= 1.)
3.14 It’s obvious if ny = 0, otherwise true by (3.21) and (3.6).
3.15 Plug in
[mx]
for n in (3.24):
[mx]
=
[xl
+
[x-
$1 +
...
+
TX
-
e].
3.16
Theformulanmod3=1+f(( ~-~)w”-(w+~)cu~~)
can be verified
by checking it when 0 < n < 3.
A general formula for n mod m, when m is any positive integer, ap-
pears in exercise 7.25.
3.17
,Yj,,[06k<-ml[l
<j<x+k/m]
=
[k3m(j -XI]
=
t,lsj<rxl
~j,,lo<k<mIll
6i6
[xl1
x
Ek[06k<ml
-
tj=rxl tk[O<k<m(j -xl] =
m[x]
-
[m(
1x1
-
x)1 =
-[-mx]
= LmxJ.
3.18 We have
If j 6
nol
-
1 6 no:
-
v, there is no contribution, because (j +
~)a-’
< n.
Hence j = [nm] is the only case that matters, and the value in that case
equals
I(
LnK]
+ ~)a.-‘1
-
n 6
[vol-
‘1.
3.19
If and only if b is an integer. (If b is an integer, log, x is a continuous,
increasing function that takes integer values only at integer points. If
b
is not
an integer, the condition fails when x = b.)
3.20 We have
tk
kx[cr< kx <
(31
= x
tk
k[ [K/x] <
k$
LB/x]], which sums
to
~x(lB/xllB/x+
11
-
~~/xl~~lx-11).
494 ANSWERS TO EXERCISES
3.21 If 10” < 2M < lo”+‘, there are exactly n+ 1 such powers of 2, because
there’s exactly one n-digit power of 2 for each n. Therefore the answer is
1 +
LMlog2J.
Note: The number of powers of 2 with leading digit
1
is more difficult,
when 1> 1; it’s ,YOsncM
([nlog2-log11
-
Lnlog2-log(l+l)]).
3.22 All terms are the same for n and n-l except the kth, where n =
2kP1
q
and q is odd; we have S, =
S,-1 + 1 and T, = T,-j + 2kq. Hence S, = n
and
T,,
=n(n+ 1).
3.23
Xn=m
tl
im(m-l)<n<im(m+l)
%
m2-m+i
<
2n<m2+m++
w
m-i<fi<m+t.
3.24 Let
fi
= ~x/(ol+ 1). Then the number of times the nonnegative integer
m occurs in Spec( b) is exactly one more than the number of times it occurs
in
Spec(Lw).
Why? Because N(p,n) = N(K,n)
+n+
1.
3.25 Continuing the development in the text, if we could find a value of m
such that
K,
< m,
we could violate the stated inequality at n + 1 when
In trying to devise
n = 2m + 1. (Also when n = 3m + 1 and n = 3m + 2.) But the existence of
a
proof
bY
mathe-
such an m = n’ + 1 requires that
2K~,,1,21
< n’ or 3K~,,,31 6 n’, i.e., that
matical induction,
you may fail for
5~2~
<
1n’Pl
or
Kl,t/31
6
ln’I3J
.
two opposite rea-
sons. You may fail
Aha. This goes down further and further, implying that
Ko
6 0; but
Ko
= 1.
because you try to
prove too much;
What we really want to prove is that
K,
is strictly greater than n, for
Your
p(n) is
too
all n > 0. In fact, it’s easy to prove this by induction, although it’s a stronger
heavy a burden.
result than the one we couldn’t prove!
Yet you may also
fail because you try
(This exercise teaches an imDortant lesson. It’s more an exercise about
\
to
prove
too
little:
Your P(n) is too
weak a support.
the nature of induction than about properties of the floor function.)
3.26 Induction, using the stronger hypothesis
In general, you
have to balance
(($!J"-'))
the statement of
Diq’ < (q-l) for n 3 0.
your theorem so
that the support is
3.27 If D;’ =
2mb-a,
where b is odd and a is 0 or
1,
then DFlb =
3mb-a.
just enough for the
burden.
3.28 The key observation is that a,, = m2 implies an+zk+l = (m+k)2+m-k
and an+2k+2 = (m + k)2 + 2m, for 0 6 k < m; hence an+zm+l = (2m)‘. The
solution can be written in a nice form discovered by Carl Witty:
-
G.
P6lya
12381
a,-1 = 2’+
[(VT],
when2’+1<n<2’+‘+l+l.
3.29 D(a’, [an]) is at most the maximum of the right-hand side of
~(a’, Lna],y’) =
-
s
oL,n,V)+S-e-[Oor
ll-++[Oor
11.
(
A ANSWERS TO EXERCISES 495
This logic is seri-
ously floored.
3.30
x,
=
K2"
+ a-2", by induction; and X, is an integer.
3.31 Here’s an “elegant,”
“impressive” proof that gives no clue about how
it was discovered:
lxj + lyj +
lx
+
YJ
=
lx
+
1YlJ
+
lx
+
YJ
6
1x-t
;12YlJ
+
lx+
il2Yl +
$1
= [2x+ [ZyJ] =
12x1
+
12YJ.
But there’s also a simple, graphical proof based on the observation that we
need to consider only the case 0 6
x,y
<:
1. Then the functions look like this
in the plane:
A slightly stronger result is possible, namely
1x1
+
1Yl
+
l:c+yJ
6
12x1
+ PYJ
;
but this is stronger only when {x} =
i.
If we replace (x, y) by (-x,x + y ) in
this identity and apply the reflective law (3.4), we get
1YJ
+ lx+yj +
12x1
6
1x1
+
12x+2Y1.
3.32 Let f(x) be the sum in question. Since f(x) = f(-x), we may assume
that x 3 0. The terms are bounded by
2k
as k
+
--oo
and by x2/2k as
k
+
+oo, so the sum exists for all real x.
We have
f(2x)
=
2tk2k-’
I(x/~~-’
[I2
= 2f(x). Let f(x) = l(x) + r(x)
where L(x) is the sum for k 6 0 and r(x) is the sum for k > 0. Then
l(x+
1) =
l(x), and L(x) 6
l/2
for all x. When 0 6 x <
1,
we have r(x) = x2/2
+x2/4
+
. . .
=x2
andr(x+
1)=(x-
1)2/2+(x+1)2/4+(x+1)2/8+~~~=~2+1.
Hencefix+l)
=f(x)+l,
whenO<x<
1.
We can now prove by induction that f(x+n) = f(x) +n for all integers
n 3 0, when 0 <
a:
< 1. In particular, f(n) = n. Therefore in general,
f(x) =
2-"'f(2"'x)
=:
2~m12mxJ + 2?f({2"x}). But
f({2mx})
=
1({2"'x})
+
r({2mx})
<
t
+ 1; so If(x)
--xl
6 )2pm12mxJ
--xl
+2pm.t
6
2-“.5
for all m.
The inescapable conclusion is that f(x) =
Ix/
for all real x.
3.33 Let
r
=
n-i
be the radius of the circle. (a) There are
2n-
1 horizontal
lines and
2n-
1 vertical lines between cells of the board, and the circle crosses
each of these lines twice. Since
r2
is not an integer, the Pythagorean theorem
tells us that the cirl:le doesn’t pass through the corner of any cell. Hence
496 ANSWERS TO EXERCISES
the circle passes through as many cells as there are crossing points, namely
8n
-
4 = 8r. (The same formula gives the number of cells at the edge of the
board.) (b) f(n, k)
=41-j.
It follows from (a) and (b) that
The task of obtaining more precise estimates of this sum is a famous problem
in number theory, investigated by Gauss and many others; see Dickson
[65,
volume 2, chapter
61.
3.34 (a) Let
n
= [lgn] . We can add 2”’
-
n terms to simplify the calcula-
tions at the boundary:
f(n)+(2m-n)m
=
c[lgk]
=
xj[j=[lgk]][l<k<2m]
k=l
i,k
=
~j[2jP’ik<2j][l<j<m]
=
ZjIj-l
=
2m(m-l)+l.
j=l
Consequently f(n) = nm
-
2” + 1.
(b) We have [n/21 =
L(n+l)/2],
and it follows that the solution to the
general recurrence g(n) = a(n) + g( [n/21) + g(
[n/2])
must satisfy Ag(n) =
Aa(n)tAg(Ln/2J).
Inparticular,
whena
=n-1, Af(n) =
l+Af(ln/2J)
is satisfied by the number of bits in the binary representation of n, namely
[lg(n + 1 )I. Now convert from A to
t.
A more direct solution can be based on the identities [lg 2jl = [lg j] + 1
and rlg(2j
-
l)]
= [lgj] +
[j>l],
for j > 1.
3.35
(n+l)2n!e=A,+(n+l)2+(n+l)+B,,where
A
n
=
(n
+
1
)‘n!
+
(n
+
1
)‘n!
+.
.
.
+
(n
+
1
)‘n!
O!
l!
(n-l)!
is
a
multiple
of
n
and B, =
(n + l)‘n! + (n + l)‘n! +
(n
+ 2)!
(n+3)!
“’
2el+-
(
1 1
n+3
+
(n+3)(n+4)
+“’
1
1 1
<
s
l+-
(
nf3
+
(n+3)(n+3)
+“’
1
= (n+l)(n+3)
(n + 2)2
is less than 1. Hence the answer is 2 mod n.
A ANSWERS TO EXERCISES 497
3.36 The sum is
t
2 L4pm[m= Llgl]]
[l=
[lgk]][l
<k~2~“]
k.1.m
=
t
22’4
m[2m~L<2m+‘][2L<k<2L+1][O~m<n]
k,L,m
=
~4~m[2m~1<2m+‘][O~m<n]
=;
2~“[06m<nl
=
2(1
-2m~n).
m
3.37 First consider the case m < n, which breaks into
subcases
based on
whether m <
in;
then show that both sides change in the same way when
m is increased by n.
This is really only a
3.38 At most one
x.k
can be noninteger. Discard all integer xk, and suppose
level 4 problem, in
spite of the way it’s
that n are left. When {x} # 0, the average of {mx} as m
t
co
lies between
f
stated.
and
5;
hence {mxl} -t . . + {mx,}
-
{mxl + . . . + mx,} cannot have average
value zero when n > 1.
But the argument just given relies on a difficult theorem about uniform
distribution. An elementary proof is possible, sketched here for n = 2: Let
P,
be the point ({mx},{my}). Divide the unit square 0 6
x,y
< 1 into
triangular regions A and B according as x + y < 1 or x + y 3 1. We want to
show that
P,
E
B for some m, if {x} and {y} are nonzero. If
P1
E
8, we’re
done. Otherwise there is a disk D of radius
c
> 0 centered at
P1
such that
D
C
A. By Dirichlet’s box principle, the sequence PI, . . ,
PN
must contain
two points with
/Pk
--
Pj/
<
e
and k > j, if N is large enough.
Pl
It follows that Pk-j
I
is within
c
of
(1,l)
-
PI; hence Pk-j 1
E
B.
3.39 Replace j by
b
-
j and add the term j = 0 to the sum, so that exercise
15 can be used for th.e sum on j. The result,
[x/bkl
-
[x/bk+‘] + b
-
1 ,
telescopes when sum:med on
k.
498
ANSWERS TO EXERCISES
3.40 Let L2J;;I = 4k +
r
where -2 <
r
< 2, and let m = LJ;;]. Then the
following relationships can be proved by induction:
segment r m X
Y
wk
-2
2k-1
m(m+l)-n-k
k
Sk
-1
2k-1
-k
m(m+l)-n+k
Ek
0
2k n-m(m+l)+k --k
Nk
1
2k k
n-m(m+l)-k
if and only if
(2kk1)(2k-1)
<n
$
(2k-1)(2k)
(2k-1)(2k)
< n < (2k)(2k)
(2k) (2k) < n < (2k)
(2k+l)
(2k)(2k+l)
<n
<
(2k+1)(2k+l)
Thus, when k 3 1,
Wk
is a segment of length 2k where the path travels west
and y(n) = k;
Sk
is a segment of length 2k
-
2 where the path travels south
and x(n) =
-k;
etc. (a) The desired formula is therefore
y(n) =
(-l)“((n-m(m+l))~[~2fi]
isodd]
-
[irnl).
(b) On all segments, k =
max(ix(n)~,~y(n)~).
On segments wk and
Sk
we
have x < y and n
$-
x + y = m(m + 1) = (2k)’ -- 2k; on segments
Ek
and Nk
we have x 3 y and n
-
x
-
y = m(m + 1) = (2k)2 + 2k. Hence the sign is
(-l)lxini<Ylnii
3.41 Since
l/a
+
l/@2
= 1, the stated sequences do partition the positive
integers. Since the condition g(n) = f(f(n)) + 1 determines f and g uniquely,
we need only show that [[n+]
Q]
+ 1 =
lna2j
for all n > 0. This follows
from exercise 3, with
01
=
C$
and n = 1.
3.42 No; an argument like the analysis of the two-spectrum case in the text
and in exercise 13 shows that a tripartition occurs if and only if 1
/OL
+ l/(3 +
l/y-l and
{~}+{~}+{~}
= 1,
for all n > 0. But the average value of
{(n-t
1)/a} is
l/2
if
OL
is irrational, by
the theorem on uniform distribution. The parameters can’t all be rational,
and if y = m/n the average is 3/2
-
1/(2n). Hence y must be an integer, but
this doesn’t work either. (There’s also a proof of impossibility that uses only
simple principles, without the theorem on uniform distribution; see [125].)
3.43 One step of unfolding the recurrence for K, gives the minimum of the
four numbers 1 + a+ a.b.KLlnPIPaj,(Cbil, where a and b are each 2 or 3.
(This simplification involves an application of (3.11) to remove floors within
floors, together with the identity x + min(y, z) = min(x + y, x + z). We must
omit terms with negative subscripts; i.e., with n
-
1
-
a < 0.)
Too easy.
A more interesting
(still unsolved)
problem: Restrict
both
cc
and
f~
to
be < 1 , and ask
when the given
multiset determines
the unordered
pair
ia-,
Bl.
A ANSWERS TO EXERCISES 499
Continuing along such lines now leads to the following interpretation:
K, is the least number > n in the multiset S of all numbers of the form
1
+
a’ + a’ a2 + a’ a2a3 + . . . + a’ a2a3 . . . a, ,
where m 3 0 and each ok is 2 or 3. Thus,
S =
{1,3,4,7,9,10,13,15,19,21,22,27,28,31,31,...};
the number 31 is in S “twice” because it has two representations
1
+ 2 + 4 +
8 + 16 = 1 + 3 + 9 + l8. (Incidentally, Michael F’redman
[108]
has shown that
lim,,,
K,/n
= 1,
ie.,
that S has no enormous gaps.)
3 44 Let diqi =
DF!,mumble(q-l),
so that
DIP’
= (qD:_),
+dp))/(q
-
1)
and
a$’
=
]D$‘,/(q
-1)l.
Now DF!, 6 (q
-
1)n
H
a;’
< n, and the
results follow. (This is the solution found by Euler
[94],
who determined the
a’s and d’s sequentially without realizing that a single sequence
De’
would
suffice.)
3.45 Let 01> 1 sati,sfy
a+
I/R
= 2m. Then we find 2Y, =
a’”
+ aP2”, and
it follows that
Y,
= [a’“/21
3.46 The hint follows from (3.g), since
2n(n+
1) =
[2(n+
:)‘I.
Let
n+B
=
(fi’
+
fi’-‘)rn
and n’ + 8’ =
(fi”’
+
&!‘)m,
where 0 <
8,8’
< 1.
Then 8’ = 20 mod 1 = 28
-
d, where d is 0 or 1. We want to prove that
n’ =
Lfi(n
+
i
)]
;
this equality holds if and only if
0 <
e/(2-JZ)+Jz(i
-d) < 2.
To solve the recurrence, note that Spec( 1 + 1
/fi
) and Spec( 1 +
fi
) partition
the positive integers; hence any positive integer a can be written uniquely in
the form a =
\(&’
+
fi”)m],
w
here 1 and m are integers with m odd
and
1
> 0. It follows that
L,
=
L(
fi’+”
+
fi”nP’)mj.
3.47 (a) c =
-i.
(1~)
c is an integer. (c) c = 0. (d) c is arbitrary. See the
answer to exercise 1.2.4-40 in
[173]
for more general results.
3.48 (Solution by Heinrich Rolletschek.) We can replace (a,
(3)
by
({
(3},
LX
+
\l3J
) without changing
\na]
+ Ln(3]. Hence the condition a =
{B}
is
necessary. It is also sufficient: Let m =
]-fi]
be the least element of the given
multiset, and let S be the multiset obtained from the given one by subtracting
mn from the nth smallest element, for all n. If a =
{(3),
consecutive elements
of S differ by either
ci
or 2, hence the multiset i.S = Spec(a) determines
01.
3.49 According to unpublished notes of William A. Veech, it is sufficient to
have a(3,
(3,
and 1 linearly independent over the rationals.
500 ANSWERS TO EXERCISES
3.50 H. S. Wilf observes that the functional equation f(x2
-
1) =
f(x)’
would
determine f(x) for all x 3
@
if we knew f(x) on any interval
(4
. .
@
+ e).
3.51 There are infinitely many ways to partition the positive integers into
three or more generalized spectra with irrational ak; for example,
Spec(2ol; 0) U Spec(4cx;
--oL)
U Spec(4a; -301) U
Spec(
fi;
0)
works. But there’s a precise sense in which all such partitions arise by “ex-
panding” a basic one,
Spec(
o1)
U
Spec(
p); see
[128].
The only known rational
examples, e.g.,
Spec(7; -3) U
Spec(
I;
-1) U
Spec(
G;
0) ,
are based on parameters like those in the stated conjecture, which is due to
A. S. Praenkel
[103].
3.52 Partial results are discussed in
[77,
pages 30-311.
4.1
1, 2, 4, 6, 16, 12.
“Man made
4.2
Note that m,, + n,, =
min(m,,
np)
+
max(m,,
np). The recurrence
the integers:
~11
e/se
is
lcm(m,n) = (
n
/(
n mod m)) lcm(n mod m, m) is valid but not really
advis-
DieudonnC.”
able for computing lcm’s; the best way known to compute
lcm(m,
n) is to
-R. K. Guy
compute gcd(m,n) first and then to divide mn by the gtd.
4.3 This holds if x is an integer, but n(x) is defined for all real x. The
correct formula,
n(x)
-
X(x
-
1) = [
1x1
is prime] ,
is easy to verify.
4.4 Between
A
and
5
we’d have a left-right reflected Stern-Brocot tree
with all denominators negated, etc. So the result is all fractions m/n with
m
I
n. The condition m’n-mn’ = 1 still holds throughout the construction.
(This is called the Stern-Brocot wreath, because we can conveniently regard
the final
y
as identical to the first
g,
thereby joining the trees in a cycle at
the top. The Stern-Brocot wreath has interesting applications to computer
graphics because it represents all rational directions in the plane.)
4.5
Lk
=
(A
:) and
Rk
=
(Ly)
;
this holds even when k < 0. (We will find a
general formula for any product of L’s and R's in Chapter 6.)
4.6 a = b. (Chapter 3 defined x mod 0 = x, primarily so that this would
After all, ‘mod y’
be true.)
sort of means
“pre-
tend y is zero.” So if
4.7
We need m mod 10 = 0. m mod 9 = k. and m mod
8
= 1. But m can’t
it already is, there’s
be both even and odd.
nothing to pretend.
A ANSWERS TO EXERCISES 501
4.8 We want 1 Ox
+
6y
=
1 Ox
+
y (mod
15);
hence 5y
=
0 (mod 15); hence
y
s
0 (mod 3). We must have y = 0 or 3, and x = 0 or 1.
4.9
32k+’
mod 4
q
= 3, so (3
2k+’
-1)/2
is odd. The stated number is divisible
by
(3’
-
1)(2
and (3”
-
1)/2
(and by other numbers).
4.10
999(1
-
;)(l
-- A) = 648.
4.11
o(O)
= 1;
o(1)
= -1; o(n) = 0 for n > 1. (Generalized Mobius
functions defined on. arbitrary partially ordered structures have interesting
and important properties, first explored by Weisner
[299]
and developed by
many other people, notably Gian-Carlo Rota
[254].)
4.12
xdim
tkid
P(d/k)
g(k)
=
tk\,,,
td\(m/k)
CL(d)
g(k) =
&,,,
g(k) X
[m/k=
11
= s(m),
by
(4.7) and
(4.9).
4.13 (a)
nP
6 1 for all p; (b) p(n) # 0.
4.14 True when k
:>
0. Use
(4.12),
(4.14),
and (4.15).
4.15 No. For example,
e,
mod 5 =
[2or
31;
e,
mod 11 =
[2,3,7,
or
lo].
4.16
l/e,
+l/e~+~~~+l/e,=l-l/(e,(e,-l))=l-l/(e,+I
-1).
4.17 We have
f,
mod
f,
= 2; hence gcd(f,, f,) = gcd(2,f,) = 1. (Inci-
dentally, the relation
f,
=
fof,
. ,
.
f,-l
+ 2 is very similar to the recurrence
that defines the
Eucl.id
numbers e,.)
4.18 Ifn= qmand q isodd, 2”+1 = (2m+1)(2n~m-2n~2m+~~~-2m+1).
4.19 Let
p1
= 2 and let pn be the smallest prime greater than
2Pnm1.
Then
2Pvl
<
pn
<
2Pn-I
t1
, and it follows that we can take b =
lim,,,
Igin)
p,,
where
Igin)
is the function lg iterated n times. The stated numerical value
comes from
p2
= 5,
p3
= 37. It turns out that
p4
=
237
+ 9, and this gives
the more precise value
b
FZ
1.2516475977905
(but no clue about
ps).
4.20 By Bertrand’s, postulate,
P,
<
10". Let
K =
x
10PkZPk
=
.200300005,.
, .
k>l
Then
10nLK
=
P,
+ fraction (mod
10Znm
').
4.21 The first sum is n(n), since the summand is (k + 1 is prime). The
inner sum in the second is t,Gk<m [k\m], so it is greater than 1 if and only
if m is composite; again we get n(n). Finally
[{m/n}1
= [ntm], so the third
sum is an application of Wilson’s theorem. To evaluate n(n) by any of these
formulas is, of course, sheer lunacy.
502 ANSWERS TO EXERCISES
4.22 (b,”
-
l)/(b-1)=
((bm-l)/(b-l))(bmn~m+~~~+l).
[Theonly
prime numbers of the form (1
OP
-
1)/9 for p
e
2000 occur when p = 2, 19,
23, 317, 1031.1
4.23
p(2k
+ 1) = 0;
p(2k)
= p(k) + 1, for k 3 1. By induction we can show
that p(n) =
p(n-2”),
if n > 2” and m > p(n). The kth Hanoi move is disk
p(k), if we number the disks 0,
1,
. . . , n
-
1. This is clear if k is a power of 2.
And if 2” < k <
2m+1,
we have p(k) < m; moves k and k
-
2”’
correspond in
the sequence that transfers m + 1 disks in
T,,,
+ 1 +
T,,,
steps.
4.24 The digit that contributes
dpm
to n contributes dp”-’ + . . . + d =
d(p”‘-
l)/(p
-
1) to e,(n!), hence
eP(n!)
=
(n-v,(n))/(p
-
1).
4.25
n\\n
W
mp
= 0 or
mp
=
np,
for all p. It follows that (a) is true.
But (b) fails, in our favorite example m = 12, n =
18.
(This is a common
fallacy.)
4.26 Yes, since
QN
defines a subtree of the Stern-Brocot tree.
4.27 Extend the shorter string with M’s (since M lies alphabetically be-
tween L and R) until both strings are the same length, then use dictionary
order. For example, the topmost levels of the tree are LL < LM < LR <
MM < RL < RM < RR. (Another solution is to append the infinite string
RL” to both inputs, and to keep comparing until finding L < R.)
4.28 We need to use only the first part of the representation:
RRRLL L L L L L R R R R R R
1
2 3 4 7 10 13. 16 19
22 25 47
@
91 113 135
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36~
431"'
The fraction
4
appears because it’s a better upper bound than
4,
not because
it’s closer than f . Similarly,
F
is a better lower bound than
3.
The simplest
upper bounds and the simplest lower bounds all appear, but the next really
good approximation doesn’t occur until just before the string of R’s switches
back to L.
4.29 1 /a. To get 1
-x
from x in binary notation, we interchange 0 and 1; to
get 1 /a from a in Stern-Brocot notation, we interchange L and R. (The finite
cases must also be considered, but they must work since the correspondence
is order preserving.)
4.30 The m integers x
E
[A, A+m) are different mod m; hence their residues
(x mod ml,. . .
, x mod m,) run through all ml . . .
m,
= m possible values, one
of which must be (al mod ml,. . . , a, mod m,) by the pigeonhole principle.
4.31 A number in radix b notation is divisible by d if and only if the sum
of its digits is divisible by d, whenever b = 1 (mod d). This follows because
(a,. . .
aO)b
=a,bm+...+aobo
-
am+...+ao.
A ANSWERS TO EXERCISES 503
4.32 The
q(m)
nu:mbers
{
kn mod m 1 k
I
m and 0 < k < m} are the num-
bers {k 1 k
I
m and 0 6 k < m} in some order. Multiply them together and
divide
by
nOskcm,
klm
k.
4.33 Obviously h(1) = 1. If m
I
n then h(mn) =
td,mn
f(d) g(mn/d) =
tc\m,*\n
f(cd) g((m./c)(n/d)) =
Xc,,
Ed,,,
f(c)
s(m/cl
f(d)
g(nld);
this is
h(m) h(n), since c 1. d for every term in the sum.
4.34 g(m) =
x:d,mf(d)
=
x.d,mf(m/d)
=
Eda,
f(m/d)
if f(x) is zero
when x is not an integer.
4.35 The base cases are
I(O,n) = 0; I(m,O) = 1.
When
m,n
> 0, there are two rules, where the first is trivial if m > n and
the second is trivial if m < n:
I(m,n)
=
I(,m,nmodm)
-
[n/mJI(nmodm,m);
I(m,n)
= I(,m mod n,n) ,
4.36 A factorization of any of the given quantities into nonunits must have
m2
-
10nZ
=
f2
or
:&3,
but this is impossible mod 10.
4.37 Let a, =
2-“ln(e,
-
5) and b,
:=
2-“ln(e,
+
i).
Then
e
n=[E2”+iJ
w
a,$lnE<b,.
And
a,-1
< a,, < b, < b,
1,
so we can take E =
lim,,,
eaTI.
In fact, it
turns out that
a product that converges rapidly to (l.26408473530530111)2. But these ob-
servations don’t tell us what
e,
is, unless we can find another expression for E
that doesn’t depend on Euclid numbers.
4.38
an-bn=(am-bm)(an~mbO+an~2mbm+...+anmodmbn~m~nmodm)+
bm[n/m]
canmodm
_
t,nmodm),
4.39 If
al
. . . at and
b,
. .
. b, are perfect squares, so is
al
atbl . . .
b,/cf
. . c: ,
where
{al
, . . .
,at}n{bl,...,b,}={cl,...
,cV}.
(It can be shown, in fact, that
the sequence
(S(l),S(2),S(3),.
. . , )
contains every nonprime positive integer
exactly once.)
504 ANSWERS TO EXERCISES
4.40 Let f(n) =
n,,,,,,,,,
k =
n!/pl"/pJ
Ln/p]! and g(n) = n!/pEP(“!l.
Then
s(n)
=
f(n)f(
ln/PJ)
f(
ln/p’J)
.
.
.
=
f(n)
g(
b/d)
.
Also f(n) = ao!(p
-
l)!Ln/pl =
ao!(-l)L"/PJ
(mod
p),
and
e,(n!)
=
Ln/pJ
+
cp
(Ln/pJ
!)
. These recurrences make it easy to prove the result by induction.
(Several other solutions are possible.)
4.41 (a) If
n2
= -1 (mod p) then (n2)(pP’i/2 = -1; but Fermat says it’s
+l.
(b) Let n = ((p
-
1)/2)!; we have n = (--l)(P~‘i’2
n,sk<p,2(p
-k) =
(p
-
l)!/n,
hence
n2
= (p
-
l)!.
4.42 First we observe that k
I
1
+=+ k
I
1+
ak for any integer a, since
gcd(k,
1)
= gcd(k,
1+
ak) by Euclid’s algorithm. Now
m-Ln
and
n’In
mn’
-L
n
H
mn’+nm’
I
n
Similarly
m’
I
n’ and n-!-n’
H
mn’+ nm’
I
n’.
Hence
m
I
n and m’
-L
n’ and n
I
n’
M
mn’+nm’
I
nn’.
4.43 We want to multiply by LP’R, then by
RP’
LV’RL, then L-’ R, then
RP2LP’RL2, etc.; the nth multiplier is RPpcnlLP’RLp”“, since we must cancel
p(n) R’s. And
Rm~mL
‘RLm =
(y,;:,).
4.44 We can find the simplest rational number that lies in
[.3155,.3165)
= [$$&,a)
by looking at the Stern-Brocot representations of
&$
and $$$ and stopping
just before the former has L where the latter has
R:
(ml,nl,m2,n2)
:= (631,2000,633,2000);
while
ml
>
n.1
or
rn2
<
n2
do
if
rnz
<
n2
then (output(L);
(nl,nz)
:=
(nl,nz)
-
(ml,m,))
else (output(R);
(ml,
m2) :=
(ml,
ml)
-
(nl
,nz))
.
The output is LLLRRRRR = &
z
.3158. Incidentally, an average of .334
implies at least 287 at bats.
A ANSWERS TO EXERCISES 505
4.45 x2 E x (mod
10n)
(
x(x
-
1) E 0 (mod 2”) and x(x
-
1) E 0
(mod
5n)
M
x mod 2” = [Oor
11
and x mod 5” = [Oor 11. (The last step
is justified because x(x
-
1) mod 5 = 0 implies that either x or x
-
1 is a
multiple of 5, in which case the other factor is relatively prime to 5n and can
be divided from the congruence.)
So there are
,at
most four solutions, of which two (x = 0 and x = 1)
don’t qualify for the title “n-digit number” unless n = 1. The other two
solutions have the forms x and 1
On
+ 1 -- x, and at least one of these numbers
is > 1 On-‘. When n = 4 the other solution, 10001
-
9376 = 625, is not a
four-digit number.
1Ne
expect to get two n-digit solutions for about 90% of
all n, but this conjecture has not been proved.
(Such self-reproducing numbers have been called “automorphic.“)
4.46 (a) If j’j
-
k’k = gcd(j,k), we have nk’knscdii,k) = ni’i = 1 and
nk’k
-
1. (b) L te n = pq, where p is the smallest prime divisor of n. If
2” E 1 (mod n) then 2”
G
1 (mod p). Also 2P-l = 1 (mod p); hence
2scdipm
‘,nl
= 1 (mod p). But gcd(p
-
1 In) = 1 by the definition of
p.
4.47 If
n+’
= 1 (mod m) we must have n
I
m. If
nk
=
nj
for some
1 < j < k < m, then nkPj = 1 because we can divide by nj. Therefore if the
numbers
n’
mod m, . ,
n”-’
mod m are not distinct, there is a k < m
-
1
with
nk
= 1. The least such k divides m-
1,
by exercise 46(a). But then kq =
(m
-
1 )/p for some prime p and some positive integer q; this is impossible,
since nkq $ 1. Therefore the numbers
n’
mod m, . . , nmP’ mod m are
distinct and relatively prime to m. Therefore the numbers
1,
. , m
-
1 are
relatively prime to
n-L,
and m must be prime.
4.48 By pairing numbers up with their inverses, we can reduce the product
(mod m) to
n
l~n<m,n2modm=l
n.
Now we can use our knowledge of the
solutions to
n2
mod m =
1.
By residue arithmetic we find that the result is
m
-
1 if m = 4, pk, or
2pk
(p > 2); otherwise it’s
+l.
4.49 (a) Either m < n (@(N
-
1) cases) or m = n (one case) or m > n
(O(N
-
1) again). H
ence
R(N) = 2@(N
-
1) + 1. (b) From (4.62) we get
2@(N-l)+l
=
l+x~(d)LN/d]LN/d-11;
d>l
hence the stated result holds if and only if
x
p(d)
LN/dJ
= 1 ,
d2’
for N
:z
1
And this is a special case of (4.61) if we set f(x) = (x 3 1)
506 ANSWERS TO EXERCISES
4.50 (a) If f is any function,
t
f(k)
=
t
t
f(k)[d=gcd(km)]
@$k<m
d\m
OSk<m
=
t
t
f(k)
[k/d1
m/d]
d\m
OSk<m
=
t
2
f(kd)[kIm/d]
d\m
O<k<m/d
=
t
1
f(km/d)[kI
d]
;
d\m
OSk<d
we saw a special case of this in the derivation of (4.63). An analogous deriva-
tion holds for
n
instead of
t.
Thus we have
zm
-
1
=
n
(z-
Wk)
=
n
n
(z-
mkm’d) =
n?&&)
OSk<m
d\m
OSk<d
d\m
k_Ld
because
w”‘/~
=
e2ni’d.
Part (b) follows from part (a) by the analog of (4.56) for products
instead of sums. Incidentally, this formula shows that
Y,(z)
has integer
coefficients, since
Y,(z)
is obtained by multiplying and dividing polynomials
whose leading coefficient is 1.
4.51
(x~+...+x,)~
=
tk,+...+k,,zpp!/(kl!.
k,!)x:’
. .
.x:,
andthecoeffi-
cient is divisible by p unless some
kj
= p. Hence
(x1
+.
.
.+x,)P
E x7
+.
.+xK
(mod p). Now we can set all the x’s to 1, obtaining
np
E n.
4.52 If p > n there is nothing to prove. Otherwise x
I
p,
so
xkcP
‘I
-
1
(mod p); this means that at least
[(n
-
l)/(p
~ l)] of the given numbers are
multiples of p. And (n
-
l)/(p
-
1) 3 n/p since n 3
p.
4.53 First show that if m 3 6 and m is not prime then (m-2)!
G
0 (mod m).
(If m = p2, the product for (m
-
2)! includes p and
2p;
otherwise it includes
d and m/d where d < m/d.) Next consider cases:
Case 0, n < 5. The condition holds for n = 1 only.
Case 1, n > 5 and n is prime. Then (n
-
l)!/(n + 1) is an integer and
it can’t be a multiple of n.
Case 2, n 3 5, n is composite, and n + 1 is composite. Then n and
n+l divide
(n-l)!,andnIn+l;
hencen(n+l)\(n-l)!.
Case 3, n > 5, n is composite, and n + 1 is prime. Then (n
-
l)! E 1
(mod n + 1) by Wilson’s theorem, and
[(n-l)!/(n+l)J
=
((n-l)!+n)/(ntl);
A ANSWERS TO EXERCISES 507
this is divisible by
11.
Therefore the answer is: Either n = 1 or n # 4 is composite.
4.54
EJ
(1 OOO!) > 500 and
es
(1 OOO!)
==
249, hence 1 OOO! = a. 1
0249
for some
even integer a. Since 1000 = (1300)5, exercise 40 tells us that
a.
2249
=
looo!/5249
E
-1 (mod 5). Also 2
249
= 2, hence a = 2, hence a mod 10 = 2_
or 7; hence the answer is 2.1
0249.
4.55 One way is to prove by induction that
P&Pt(n
+ 1) is an integer;
this stronger result helps the induction go through. Another way is based
on showing that each prime p divides the numerator at least as often as it
divides the denominator. This reduces to proving the inequality
k=l
which follows from
k=l
[(In
-
1
)/ml
+
LWmj
3
lnhl
The latter is true when 0 6 n < m, and both sides increase by 4 when n is
increased by m.
4.56 Let f(m) = ~~~~’
min(k,2n-k)[m\k],
g(m) =
EL=:
(2n-2k-1)
x
[m\(2k+
l)].
Th
e
number of times p divides the numerator of the stated
product is f(p) + f(p2) + f(p3) +
...,
and the number of times p divides the
denominator is g(p) +
g(p2)
+
g(p3)
+
...
. But f(m) = g(m) whenever m
is odd, by exercise 2.32. The stated product therefore reduces to 2”‘”
‘1,
by
exercise 3.22.
4.57 The hint suggests a standard interchange of summation, since
x
[d\ml
=
x
[m=
dkl = Ln/dj .
lSVlI$II
O<k<n/d
Calling the hinted sum ,X(n), we have
I(m
+ n)
-
X(m) ~ X(n) =
x
v(d).
dES(m,nl
On the other hand, we know from (4.54) that
,X(n)
=
in(n
+ 1). Hence
.X(m
+ n)
-
X(m) ~ Z(n) = mn.
4.58 The function f(m) is multiplicative, and when m =
pk
it equals 1 +
p + +
pk.
This is a power of 2 if and only if p is a Mersenne prime and
k =
1.
For k must be odd, and in that case the sum is
(1 +p)(l
+p2
+p4
+-+pk
‘)
508 ANSWERS TO EXERCISES
and (k- 1)/2 must be odd, etc. The necessary and sufficient condition is that
m be a product of distinct Nersenne primes.
4.59 Proof of the hint: If
TL
= 1 we have
x1
=
a
= 2, so there’s no problem.
If n > 1 we can assume that
x1
6 . . < x,. Case 1:
xi’
+ . . . +
xi!,
+
(x,
-
1))’
3 1 and x, > x+1. Then we can find
p
3 x,
-
1 3
x,-l
such
that
xl’
+ ... +x;l,
+P-'
= 1; hencex, 6
p-t1
6
e,
andxl...x,
<
x1
. . .
~~~1
(p
+ 1) 6
el
. . . e,, by induction. There is a positive integer m
such that a =
x1
. . . x,/m; hence a 6
el
. . .
e,
= e,+l
-
1, and we have
x1
. . .
~~(~~+l)<el...e,e,,+l.
Case2:
x~'+~~~+x~~,+(~,-l)-~~l
and
x,,
=
x,-l.
Let a = x, and a-’ + (a
-
1
)-’
= (a
-
2))’ + L-l. Then
we can show that a 3 4 and (a-2)(<+ 1) 3
a2.
So there’s a
(3
2
C
such
that
xi’
+ ...
+ x,1, + (a -- 2)-l + p-’ = 1; it follows by induction that
x1
. . .
xn
6
x1
.
..x.-2(a-2)(2+
1)
6
XI
.
..x+z(a-2)(@
+ 1)
6
el
.
..e.,
and we can finish as before. Case 3:
XT’
+ . . . +
xi!,
+ (x,
-
I)-'
<
1.
Let a =
xn,
and let a-’ +
0~~’
= (a
-
1
)-’
+
(?-‘.
It can be shown that
(a
-
1)
(6
+ 1) > a( a + 1
),
because this identity is equivalent to
aa2-a’a+aa-a2+a+a
> 0,
which is a consequence of
aa( a
-
a)
+
(1 + a)a
3
( 1
+
a)a
>
a2
-
a.
Hence
we can replace
x,
and a by a
-
1 and
(3,
repeating this transformation until
cases 1 or 2 apply.
Another consequence of the hint is that
l/x,
+ . . . + l/x, < 1 implies
l/xl
+
...
+1/x,
6
l/e1
+
...
+1/e,; see
exercise 16.
4.60 The main point is that
8
<
5.
Then we can take
p1
sufficiently large
(to meet the conditions below) and
pn
to be the least prime greater than
p;-,.
With this definition let a,, =
33”lnp,
and
b,
=
3-nln(p,
+ 1). If we
can show that a,-1 < a,, <
b,
6 b,-1, we can take P =
lim,,,
can
as in
exercise 37. But this hypothesis is equivalent to
pi-,
<
p,,
< (p,-l +
l)3.
If
there’s no prime
p,,
in this range, there must be a prime p <
p;-,
such that
p +
cpe
> (p,-1 + 1
)3.
But this implies that
cpe
>
3p213,
which is impossible
when p is sufficiently large.
We can almost certainly take
p1
= 2, since all available evidence indi-
cates that the known bounds on gaps between primes are much weaker than
the truth (see exercise 69). Then
p2
= 11,
p3
=
1361,
p4
= 2521008887, and
1.306377883863 < P
<
1.306377883869.
4.61 Let
T?L
and
fi
be the right-hand sides; observe that fin’
-
m’fi
= 1,
hence
??I.
I
T?.
Also
m/c
>.
m//n’ and N = ((n + N )/n’)n’
-
n 3
li
>
((n+N)/n’-l)n’-
-
--n
-
N n’ 3 0. So we have
T?-L/?L
3 m/‘/n”. If equality
doesn’t hold, we have n” = (&n’
-
m’fi)n” = n’( tin”
-
m”fi) + fi(m”n’
-
m’n”) 3 n’ +
fi
> N, a contradiction.
A ANSWERS TO EXERCISES 509
I have discovered a
wonderful proof of
Fermat’s Last Theo-
rem, but there’s no
room for it here.
Therefore, if Fer-
mat’s Last Theorem
is false, the universe
will not be big
enough to write
down any numbers
that disprove it.
Incidentally, this exercise implies that (m +
m”)/(n
+ n”) =
m//n’,
although the former fraction is not always reduced.
4.62 2 ‘$2
2+2
3
-2
6-2
7+2~'2+2~~'3-2~20-2~21+2~30+2~31-
2
~42
-
2
43
+ . . . can be written
;
+ 3
t(2-4k1-6k-3
_
2-4k2-10k
-7)
k>O
Incidentally, this sum can be expressed in closed form using the “theta func-
tion” O(z,
h)
=
tk
e~xhkz+2irk; we have
e t-3
i
+
~;6(~ln2,3iln2)
-
&O(%ln2,5iln2)
4.63 Any n > 2 either has a prime divisor d or is divisible by d = 4. In either
case, a solution with exponent n implies a solution (an/*)*+(bn/*)* = (c”/*)*
with exponent d. Since d = 4 has no solutions, d must be prime.
The hint follows from the binomial theorem, since
aP+(x-a)P-pap
is a multiple of x when p is odd. Assume that a
-L
x. If x is not divisible
by p, x is relatively prime to cP/x; hence x = mp for some m. If x is divisible
by p, then
cp/x
is divisible by p but not by p2, and
cp
has no other factors
in common with x.
(The values of a, b, c must, in fact, be even higher than this result
indicates! Inkeri
[160]
has proved that
A sketch of his proof appears in
[249,
pages 228-2291, a book that contains
an extensive survey of progress on Fermat’s Last Theorem.)
4.64 Equal fractions in
YN
appear in “organ-pipe order”:
2m 4m rm 3m m
--
--
2n’ 4n’ . . . .
ml
. . . . 3n’
n.
Suppose that
IPN
is correct; we want to prove that
&+I
is correct. This
means that if kN is odd, we want to show that
k-l
N+l
=
yN,kN;
if kN is even, we want to show that
k-l
yN,kN 1 yN,kN
~
N+l
yN,kN
YN,kN+l
*
510 ANSWERS TO EXERCISES
In both cases it will be helpful to know the number of fractions that are
strictly less than (k
-
l)/(N + 1) in
LPN;
this is
1
=
i(kN-dtl),
d =
gcd(k-l,N+l),
by (3.32). Furthermore, the number of fractions equal to (k
-
l)/(N + 1) in
~PN
that should precede it in
iPN+l
is
i
(d
-
1
-
[d
even]), by the nature of
organ-pipe order.
IfkNisodd,thendisevenand(k-l)/(N+l)isprecededbyt(kN-l)
elements of
?N;
this is just the correct number to make things work. If
kN
is
even, than d is odd and (k
-
1
)/(
N
+
1)
is preceded by
i
(kN ) elements of
?N.
If d =
1,
none of these
equ’als
(k
-
l)/(N
+ 1) and
‘J’N,~N
is
‘<‘;
otherwise
(k- 1
)/(
N
+
1) falls between two equal elements and
~PN
,k~
is ‘=‘. (C. S. Peirce
[230]
independently discovered the Stern-Brocot tree at about the same time
as he discovered
?N.)
4.65 The analogous question for the (analogous) Fermat numbers
f,
is a
“N
O
square less
famous unsolved problem. This one might be easier or harder.
than 25 x
1014
divides a Euclid
4.66 It is known that no square less than 36 x
1018
divides a Mersenne
number.”
number or Fermat number. But there has still been no proof of Schinzel’s
--I/an
Vardi
conjecture that there exist infinitely many squarefree Mersenne numbers. It
is not even known if there are infinitely many p such that
p\\(
a h b), where
all prime factors of a and b are < 31.
4.67 M. Szegedy has proved this conjecture for all large n; see
[284’],
[77,
pp.
78-791,
and
[49].
4.68 This is a much weaker conjecture than the result in the following ex-
ercise.
4.69 Cram&
[56]
showed
t’hat
this conjecture is plausible on probabilistic
grounds, and computational experience bears this out: Brent
[32]
has shown
that
P,+l
-
P,
< 602 for
Pn+l
< 2.686 x
1012.
But the much weaker bounds
in exercise 60 are the best currently proved
[221].
Exercise 68 has a “yes”
answer if
P,+j-P,
<
2PA"
for all sufficiently large n. According to Guy
[139,
problem
A8],
Paul Erdas
offe:rs
$10,000 for proof that there are infinitely many
n such that
P
clnn
lnlnn lnlnlnlnn
n+l
-P,>
___
(lnlnlnn)2
A ANSWERS TO EXERCISES 511
for all c > 0.
4.70 This holds if and only if
~2
(n) = 1/3(n), according to exercise 24. The
methods of
[78]
may help to crack this conjecture.
4.71 When k = 3 the smallest solution is n = 4700063497 = 19.47.5263229;
no other solutions are known in this case.
4.72 This is known to be true for infinitely many values of a, including -1
(of course) and 0 (not so obviously). Lehmer
[199]
has a famous conjecture
that cp(n)\(n
-
1) if and only if n is prime.
4.73 This is known to be equivalent to the Riemann hypothesis (that all
zeros of the complex zeta function with real part between 0 and 1 have real
part equal to
l/2).
What’s
114
in
radix 11
?
4.74 Experimental evidence suggests that there are about
p(
1
-
1 /e) dis-
tinct values, just as if the factorials were randomly distributed modulo p.
5.1 (11): =
(14641),,
in any number system of radix
r
3 7, because of the
binomial theorem.
5.2 The ratio
(Karl)/
= (n-
k)/(k+
1) is < 1 when k 3 Ln/2J and 3 1
when k <
[n/2],
so the maximum occurs when k = [n/2] and k =
[n/2].
5.3
Expand into factorials. Both products are equal to f(n)/f(n
-
k)f(k),
where f(n) =
(n+
l)!n!
(n- l)!.
5.4
(-,‘) =
(-l)k(k+;P’)
=
(-l)‘(;)
=
(-l)k[k>O].
If 0 < k < p, there’s a p in the numerator of (E) with nothing to cancel
t%
the denominator. Since (E) =
(“i’)
+
(:I;),
we must have
(“i’)
=
(-l)k
(mod p), for 0 < k
c
p.
5.6
The crucial step (after second down) should be
The original derivation forgot to include this extra term, which is [n =
01.
512 ANSWERS TO EXERCISES
5.7
Yes, because
rs
=
(--1 )
"/(
-r
-
1)".
We also have
rqr
+ i)" =
(2r)9;!%
5.8 f(k) = (k/n
-
1)” is a polynomial of degree n whose leading coefficient
is
nn.
By
(5.40)~
the sum is n!/nn. When n is large, Stirling’s approxima-
tion says that this is approximately
&/en.
(This is quite different from
(1
-
l/e), which is what we get if we use the approximation (1
-k/n)”
N
eek,
valid for fixed k as n
+
oo.)
5.9
E,(z)t
=
t
ksO
t(tk + t)k-‘zk/k! =
tk.Jk
+
l)k
'(tz)k/k!
=
1,
(tz),
by
(5.60).
5’1o tk>O
2zk/(k + 2) =
F(2,l;
3; z), since tk+l/tk = (k +
2)z/(k
+ 3).
5.11 The first is Besselian and the second is Gaussian:
But not
Imbesselian.
z-~‘sinz =
tka,(-l)kz2k/(2k+1)!
= F(l;l,i;-z2/4);
z-
arcsin
2
=
tkZo z2k(;)k/(2k+
l)k! =
F(;,
;;
5;~~).
5.12 (a) Yes, the term ratio is n.
(b) No, the value should be 1 when
k = 0; but (k +
1)"
works, if n is an integer. (c) Yes, the term ratio is
(k+l)(k+3)/(k+2).
(d) No, the term ratio is 1
+l/(k+l)Hk;
and
Hk
N
Ink
isn’t a rational function. (e) Yes, the term ratio is
t(k+ 1)
I
T(n
-
k)
t(k)
T(n
-
k
-
.I)
(f) Not always; e.g., not when t(k) =
2k
and T(k) = 1. (g) Yes, the term ratio
can be written
at(k+l)/t(k)
+
bt(k-t2)/t(k)
+
ct(k+3)/t(k)
a+bt(k+l)/t(k)
+ct(k+2)/t(k)
and t(k+m)/t(k) = (t(k+m)/t(k+m-1)) . . . (t(k+
1)/t(k))
is arational
function of k.
5.13 R, = n!n+‘/Pi = Qn/P,, =
Qi/n!“+‘.
5.14 The first factor in (5.25) is
(,‘i
k,)
when k < 1, and this is (-1
)Lpkpm
x
(r-r::).
The sum for k 6
1
is the sum over all k, since m 3
0.
(The condition
n 3 0 isn’t really needed, although k must assume negative values if n < 0.)
To go from (5.25) to (5.26), first replace s by -1 -n
-
q.
5.15 If n is odd, the sum is zero, since we can replace k by n-k. If n =
2m,
the sum is
(-1)“(3m)!/m!3,
by (5.29) with a = b = c = m.
A ANSWERS TO EXERCISES 513
5.16 This is just (;!a)! (2b)!
(2c)!/(a+
b)!
(b+c)!
(c+ a)! times (5.2g), if we
write the summands in terms of factorials.
5.17 (
27/2)
=
(;;)/22”;
(2-/2)
=
(;;)/24”;
so
(2nn’/2)
=
22n(2-/2).
5.18
(:;)(,"k",k)i33k.
5.19
Bl
.t(-2)
:=
tkzO
(kP’,“P’)
(-l/(k
~ tk
-
1)) (-.z)~, by (5.60), and
this is tkaO
(tt)(l/(tk-
k+
1))~~
=
‘BH,(z).
5.20 It equals
F(-al,
. ,-a,;
-bl,
. . .
,-b,;
(-l)mfnz);
see exercise 2.17.
5.21
lim,,,(n
+
m)c/nm
=
1.
5.22 Multiplying and dividing instances of (5.83) gives
(-l/2)!
x!(x-l/2)!
=
&c
("n'")
(n+xc1'2)n
2r/(n-i'2)
= lim
(
>
2n + 2x
n--2X
n+cc
2n
by (5.34) and (5.36). Also
1/(2x)! = lim
2n+2x
(
)
2n
(2n)
-2x
.
n--c%
Hence, etc. The Gamma function equivalent, incidentally, is
T(x)
l-(x
+
;,
= r(2x) r(;)/22x-
5.23 (-l)"ni, see (5.50).
5.24 This sum is (1:) F(
",~~"ll>
=
(fz),
by
(5.35)
and
(5.93).
5.25
This is equivalent to the easily proved identity
a’
(a+llEemb<
(a-b)-- =
oP
(b +
II)”
(b+l)k
bk
as well as to the operator formula a
-
b = (4 + a)
-
(4 + b).
Similarly, we have
(al
-
a21 F
al,a2,a3,
.
.
.
.
am
bl,
.
.
.
,
bn
= alF
al+l,a2,a3,...,am
13
-wF(
al,a2+l,a3,...,am
bl,
. , b,
514 ANSWERS TO EXERCISES
because
al
-
a2
=
(a1
+ k)
-~
(al + k). If al
-
bl
is a nonnegative integer d,
this second identity allows us to express F(al , , . , a,,,;
bl
, . . . ,
b,;
z) as a lin-
ear combination of F(
a2
+ j, a3, . . .
, a,,,; b2, . ,
b,;
z) for 0 6 j 6 d, thereby
eliminating an upper parameter and a lower parameter. Thus, for example,
we get closed forms for F( a, b; a
-
1; z), F(
a, b; a
-
2; z), etc.
Gauss
[116,
$71
derived analogous relations between F(a, b; c;z) and
any two “contiguous” hypergeometrics in which a parameter has been changed
by
fl
. Rainville [242’] gene:ralized this to cases with more parameters.
5.26 If the term ratio in the original hypergeometric series is
tkfl
/tk = r(k),
the term
ratio
in the new one is tk+I/tk+l =
r(k
+ 1). Hence
F
(
al,
. . . ,
a, al+l,...,a,+l,l
bl,
. . . .
b,
1)
Z
=
1
+
"-'amzF
bl
.
..b.
(
b
+,
1
,...!
b,+l,2
1)
5.27 This is the sum of the even terms of
F(2a1,.
. .
,2a,;
2bl,.
. .
,2b,;
z).
We have
(2a)=
/(2a)X
=
4(k+
a)(k+ a +
i),
etc.
5.28
WehaveF(“;b]z)=
(I-z)~"F(~~~~"~~)=
(l-zzpuF('
,"sals)=
(,
mz)c
a
bF(C-yb
1~).
(Euler proved the identity by showing that both
sides satisfy the same differential equation. The reflection law is often at-
tributed to Euler, but it does not seem to appear in his published papers.)
5.29 The coefficients of 2” are equal, by Vandermonde’s convolution. (Kum-
mer’s original proof was different: He considered
lim,,,
F(m, b
-
a; b; z/m)
in the reflection law
(5.101).)
5.30 Differentiate again to get
z(1
-
z)F"(z)
+
(2
-
3z)F'(z)
-
F(z) = 0.
Therefore
F(z)
=
F(l,l;2;z)
'by (5.108).
5.31 The condition f(k) = cT(k+ 1)
-
CT(k) implies that f(k+
1)/f(k)
=
(T(k+2)/T(k+
1)
-
l)/(l
--T(k)/T(k+
1)) is a rational function of k.
5.32 When summing a polynomial in k, Gosper’s method reduces to the
“method of undetermined coefficients!’ We have q(k) = r(k) = 1, and we
try to solve p(k) =
s(k+
1)
-
s(k). The method suggests letting s(k) be a
polynomial whose degree is cl = deg(p) + 1.
5.33 The solution to k = (k- l)s(k+ 1)
-
(k+ l)s(k) is s(k) = -k+ 5;
hence the answer is
(1
-
2k)/‘2k(k
-
1) + C.
5.34 The limiting relation holds because all terms for k > c vanish, and
E
-
c cancels with
-c
in the limit of the other terms. Therefore the second
partial sum is
lim,,o
F(-m,--n;
e-mm;l)
=
lim,,~(e+n-m)m/(e-m)m
=
(-l)yy).
5.35 (a)
2m"3n[n>0].
(b)
11
-
i)PkP’[k>,O]
=2k+‘[k>0].
A ANSWERS TO EXERCISES 515
The boxed
sentence
on the
other side
of this page
is true.
5.36 The sum of the digits of m + n is the sum of the digits of m plus the
sum of the digits of n, minus p
-
1 times the number of carries, because each
carry decreases the digit sum by p ~ 1.
5.37 Dividing the first identity by n! yields (‘ly) =
tk
(i) (,yk),
Van-
dermonde’s convolution. The second identity follows, for example, from the
formula
xk
=
(-l)k(-~)”
if we negate both x and
y.
5.38 Choose c as large as possible such that
(5)
< n. Then 0 < n
-
(5)
<
(‘3
-
(3
=
(3;
replace n by n
-
(i)
and continue in the same fashion.
Conversely, any such representation is obtained in this way. (We can do the
same thing with
n =
(9’)
-+
(“;-)
f...+
(z),
0 6
a1
<
a2
<
.”
< a,
for any fixed m.)
5.39 xmyn =
~~=,
(“‘+l-:
k)anbm~kxk
t-
I;=,
(m+~~~~~k)an
kbmyk
for
all mn > 0, by induction on m + n.
5.40 (-l)m+’
x;=,
I;,
(;)(”
y)
= (--l)m+’
I;=,((”
“k;i
S-l)-
(-y))
=
(-qm+l((rn
i-1
)
-
("--'y1))
=
('2")
_
(L),
5.41
tkaOn!/(n
-
k)! (n + k +
l)!
=:
(n!/(2n
+
l)!)
tk>,,
(‘“k+‘),
which is
2’%!/(2n
+ 1
)!.
5.42 We treat n as an indeterminate real variable. Gosper’s method with
q(k) = k + 1 and r(k) = k
-
1 -n has the solution s(k) =
l/(n
+ 2); hence
the desired indefinite sum is (-1
)XP’
$$/(“z’).
And
This exercise, incidentally, implies the formula
1
ZZ
n-l
n
(
)
k
(n+ll’(kyl)
+
(n+;)(L)
a “dual” to the basic recurrence (5.8).
5.43 After the hinted first step we can apply (5.21) and sum on k. Then
(5.21) applies again and Vandermonde’s convolution finishes the job. (A com-
binatorial proof of this identity has been given by Andrews [lo]. There’s a
quick way to go from this identity to a proof of (5.2g), explained in [173,
exercise 1.2.6-621.)
516 ANSWERS TO EXERCISES
5.44 Cancellation of factorials shows that
(;)(;)(“;“)
==
(m+;:;-k)(j;k)(y;;).
so the second sum is
l/(
“:“I
I times the first. We can show that the first sum
is (“ib) (“-~P~~b),
whenever n 3 b, even if m < a: Let a and b be fixed
and call the first sum S( m, ‘1). Identity (5.32) covers the case n = b, and
we have S(m,n) =
S(m,n
-
1) + S(m- 1,n) +
(-l)m+n(mzn)(i)(“)
since
(m+;:;-k)
=
r+:;J;j--k)
+
(m-;‘;r;-k).
The result follows by induction on
m+ n, since
(,)
= 0 when n > b and the case m = 0 is trivial. By symmetry,
the formula (“ib) (m+l:im
“)
holds whenever m > a, even if n < b.
5.45 According to (5.g),
xc<,,
(kPi’2) =
(n+A’2).
If this form isn’t “closed”
enough, we can apply (5.35) and get (2n + 1) (‘,“)4-“.
5.46 By (5.6g), this convolution is the negative of the coefficient of
z2*
in
%‘(z)K’(-z).
Now (223-‘(z)
-
1)(2%‘(-2)
-
1) =
dm;
hence
‘K’(z)‘E_
i-z)
=
ad-7
+
i’%‘(z)
+
iZ3
-1(-z)
-
$. By the binomial
theorem,
(1
-
16~~)"~ =
xc
1
n
'f
(-16)“z2”
=
-t
(;)
g,
n
so the answer is (2~)4”~‘/(2r~
-
1) +
(4;;1)/(4n
-
1).
5.47 It’s the coefficient of z” in (IBr(~)“/Qr(~))(~Br(~)~s/Qr(~)) =
l/Q,(z)‘,
where Qr(z) = 1
-r
+
rBB,(z'i
',
by (5.61).
5.48 F(2n + 2,1; n + 2;
i)
==
22n+‘/(2z:;), a special case of (5.111).
5.49 Saalschiitz’s identity (5.97) yields
5.50 The left-hand side is
k+a+m-1
zm
m
and the coefficient of 2” is
A ANSWERS TO EXERCISES 517
by Vandermonde’s convolution (5.92).
5.51 (a) Reflection gives F(a,
-n;
2a; 2) = (-1 )“F( a,
-n;
2a; 2). (Inciden-
tally, this formula implies the remarkable identity
A2”‘+’
f(0) = 0, when
f(n) = 2nxc/(2x)“.>~
(b) The term-by-term limit is
&kSm
(r)
m(-2)k
plus an addi-
tional term for k = 2m
-
1: the additional term is
(-m)... (-1) (1)...(m)
(-2m+
1) . . . (-1)22m+’
I:-2m).
(-1) (2m
-
l)!
,I
,I
pm+1
=
(-ltm+'*
=-
-2
(CL')
'
hence, by (5.104), this limit is
-l/(
y2),
the negative of what we had.
5.52 The terms of both series are zero for k > N. This identity corresponds
to replacing k by N
-
k. Notice that
5.53 When b =
-i,
the left side of (5.110) is 1
-
22 and the right side is
(1
-42+422)"2, independent of a. The right side is the formal power series
l/2
l+
1
(
)
42(2-l)+
l/2
(
1
2
16z2(z-1)2+~~~,
which can be expanded and rearranged to give 1
-
22+
Oz2
+
Oz3
f.
;
but the
rearrangement involves divergent series in its intermediate steps when
z
=
1,
so it is not legitimate.
5.54 If m + n is odd, say 2N
-
1,
we want to show that
lim F
E'O
(
N-m-;,
-N+c
-m+e
1)
1
=o.
Equation (5.92) applies, since -m +
c
> -m
-
i +
E,
and the denominator
factor
T(c-b)
=
T(N-m)
is infinite since N < m; the other factors are finite.
Otherwise m + n is even; setting n = m ~ 2N we have
fi,mo
F
(
-N,
N-m-i+e
1)
1
=
(N-1/21N
-m+c
rnN
by (5.93). The remaining job is to show that
(N
-
l/2)!
(m-N)!
-(-l/2)!
m! =
518 ANSWERS TO EXERCISES
and this is the case x = N of exercise 22.
5.55 Let Q(k) =
(k+Al)...(k+AM)Zand
R(k) =
(k+Bl)...(k+BN).
Then t(k+ 1)/t(k) =
P(k)Q(k-
l)/P(k-
l)R(k),
where P(k) = Q(k) -R(k)
is a nonzero polynomial.
5.56 The solution to
-(k+l)(k+2)
=
s(k+l)+s(k)
is s(k) =
-ik2-k-a;
hence
t
(~:)
6k=
i(-l)kp’(2k2
+4k+
1) + C. Also
(-l)k-’
=-
4
k-t
1
-
‘+‘r”*>
(,+,-
‘-)*)
2
=
v(2k2+4k+1)+;
5.57 We have
t&+1)/t(k)
=
(k-n)(k+l
+B)(-z)/(k+l)(k+O).
Therefore
we let p(k) = k+ 8, q(k) = (k-
n)(-z),
r(k) = k. The secret function s(k)
must be a constant
0~0,
and we have
k+B
=
(-z(k-n)--k)as;
hence
010
=
-l/(1
+ z) and
8
=
-nz/(l
+ z). The sum is
t
(;)zk(“-+6k
=
-&(;~;)z’+c
(The special case z = 1 was mentioned in (5.18); the general case is equivalent
to
(5.1311.)
5.58 If m > 0 we can replace
(:)
by
$,
(;I\)
and derive the formula
T,,,
=
$T,,-I,~-~
-
6
(“i’).
The summation factor
(t)-’
is therefore appropriate:
We can unfold this to get
T
m,n
-
=
To,n-m
-
H,
+
H,
-
H,-,
.
Lx
Finally
To,~
,,,
=
H,.
,,,,
so
T,,,,
=
(z)
(H,
-H,).
(It’s also possible to derive
this result by using generating functions; see Example
2
in Section 7.5.)
5.59
t.
)*O,kal
(y)[j=Llognrkj] =
ti>0,k>,
(~)[m’<k<mj+‘l, which is
tj>O
(‘j’)(mj+’
-
mj) =
(m--
l)tjao
(3)mj
= (m-
l)(m+
l)n.
A ANSWERS TO EXERCISES
519
5.60 (‘c)
z
4n/&K is the case m = n of
(my)
z
/gq(l
+
;)n(l
+
G)?
5.61 Let [n/p] = q and n mod p = r. The polynomial identity (x + 1
)P
-
xp + 1 (mod p) implies that
(x+
1)
pq+r
i=
(~+l)~(x~
+l)q
(mod p).
The coefficient of
x”’
on the left is (E). On the right it’s
tk
(,I,,)
(z), which
is just
(
m
mbd
,)
(
,m~tpJ)
because 0 6
r
<:
p.
5.62 (,‘$) = ,&i~,,.+k,,=mp
(kg)
. . .
(zn)
E
(E)
(mod p’), because all terms
of the sum are multiples of
pz
except the
(i)
terms in which exactly m of the
k’s are equal to p. ((Stanley
[275,
exercise 1.6(d)] shows that the congruence
actually holds modulo
p3
when p > 3.)
5.63 This is
S,
=
~~=,(-4)k(~+~)
=
~~=,(-4)nPk(2n~k).
The denomina-
tor of (5.74) is zero when
z
=
-l/4,
so we can’t simply plug into that formula.
The recurrence
S,
=I
-2&-l
-.SnP2 leads to the solution
S,
=
(-l)n(2n+l).
5.64
~,,,((;k) +
(2;+,))/@+
1)
=
&O
(;$,)/(k+
11,
which
is
A.&
(g;:)
= '",';;"
,
5.65 Multiply both sides by nn-’ and replace k by n
-
1
-
k to get
x
(7Y)
n-1
nk(n
-
k)! = (n
-
l)!
Z(nkf’/k!
-
nk/(k
-
l.)!)
k
k=O
=
(n-l)!nn/(n-l)!.
(The partial sums can, in fact, be found by Gosper’s algorithm.) Alternatively,
(2
knnPlekk! can be interpreted as the number of mappings of
{l
, . . . , n} into
itself with
f
(1))
. . . ,
f(k)distinctbutf(k+l) l {f(l),...,f(k)};summingonk
must give nn.
5.66 This is a
“wa.lk
the garden path” problem where there’s only one “ob-
vious” way to proceed at every step. First replace k
-
j by
1,
then replace
[A]
by k, getting
j&o
(j:‘k)
(A)
y
I ,
520 ANSWERS TO EXERCISES
The infinite series converges because the terms for fixed j are dominated by
a polynomial in j divided by 2j. Now sum over k, getting
Absorb the j + 1 and apply (5.57) to get the answer,
4(m+
1).
5.67
3(2nntt52)
by (5.26), because
(‘i’)
=
3(y).
5.68 Using the fact that
we get
“(2”
-
(,zij,)).
[n is even] ,
5.69 Since
(k:‘)
+ (‘y’) < (:) + (i)
W
k < 1, the minimum occurs
when the k’s are as equal as possible. Hence, by the equipartition formula of
Chapter 3, the minimum is
(n mod m) -t (n
-
(n mod m))
b/ml
-4
>
2
$-
(n
mod m)
:
.
L
i
A similar result holds for any lower index in place of 2.
5.70 This is F(-n,
i;
1;2); but it’s also
(-2)Pn(F)F(-n,
-n;
i
-n;
i)
if we
replacekbyn-k.
NowF(-n,-n;i-n;:)
=F(-f,-l;&n;l)byGauss’s
identity (5.111). (Alternatively, F(-n,-n;
i-n;
i)
=
2-“F(-n,
i;
i-n;
-1)
by the reflection law (5.101), and Kummer’s formula (5.94) relates this to
(5.55).) The answer is 0 when n is odd, 2-“(,,y2) when n is even. (See
[134,
$1.21
for another derivation. This sum arises in the study of a simple search
algorithm [
1641.)
5.71 (a) S(z) = EkZO okzm-+k/(l -Z)m+Zk+’ = Zm(l
-2)
-“-‘A@/(1
-z)‘).
(b) Here A(z) =
x
k20
(2,“)(-z)k/(k + 1) =
(dm
-
1)/2z,
so we have
A(z/(l
-z)‘)
= 1
-z.
Thus
S,
=
[z”]
(z/(1
-
2))“’ =
(;I;).
5.72 The stated quantity is m(m
-
n) . . . (m
-
(k
-
l)n)nkPYik’/k!. Any
prime divisor p of n divides the numerator at least k
-
y(k) times and di-
vides the denominator at most k
-
v(k) times, since this is the number of
A ANSWERS TO EXERCISES 521
times 2 divides k!.
A
prime p that does not divide n must divide the prod-
n at eas as often as it divides k!, because
uc;-tL;)-n)...(m-(k-l)
)
1 t
.
..(m-(p’-1)
)’
n
1s
a multiple of
p’
for all
r
3 1 and all m.
5.73 Plugging in X, = n! yields
OL
=
fi
= 1; plugging in X, = ni yields
K
=
1,
6
= 0. Therefore the general solution is X, =
olni
+ b(n!
-
ni).
5.74
(“l’)
-
(;I:),
for 1 6 k 6 n.
5.75 Therecurrenc:e
Sk(n+l)
=
Sk(n)+S
~~
ik
I
)
mod
3
(n) makes it possible to
verify inductively
th’at
two of the S’s are equal and that
.S-,I
mod3(n) differs
from them by (-1)“. These three values split their sum So(n) +
S1
(n) +
.Sz(n)
= 2n as equally as possible, so there must be 2” mod 3 occurrences of
[2”/31 and 3
-
(2” mod 3) occurrences of 12”/3J.
5.76
Qn,k
=
(n
f
1
l(c) +
(kn+,)’
5.77 The terms are zero unless kl 6 .. <
k,,
when the product is the
multinomial coefficient
(
km
kl,
kz
-
kl,
. . . ,
k,
-
k,pl
>
Therefore the sum over kl , . . . ,
k,-l
is
mkm
, and the final sum over
k,
yields
(
mn+’
-
l)/(m- 1).
5.78 Extend the sum to k = 2m2 + m
-
1; the new terms are
(1)
+
(‘,)
+
...-t
(1;‘)
= 0. Since m
I
(2m+
l),
the pairs (kmod m,kmod
(2m-t
1))
are distinct. Furthermore, the numbers (2j + 1) mod
(2m+
1) as j varies from
0 to 2m are the numbers 0,
1,
. . . ,
2m in some order. Hence the sum is
5.79 (a) The sum is 22np’, so the gcd must be a power of 2. If n = 2kq where
q is odd, (:“) is divisible by
2k+’
and not by
2k+2.
Each
(:$)
is divisible
by 2k+’ (see exercise 36), so this must be the gtd. (b) If
p’
6 n + 1 < p’+‘,
we get the most radix p carries by adding k to n
-
k when k =
p’
-
1. The
number of carries in this case is
r
-
e,(n
+
l),
and
r
=
e,(L(n
+ 1)).
5.80 First prove by induction that k! 3 (k/e)k.
5.81 Let fL,m,n(x) be the left-hand side. It is sufficient to show that we have
fl,,,,(l)
> 0 and tlhat
f;,,,,(x)
< 0 for 0 < x 6 1. The value of
fl,,,,(l)
is
(-l)"p"p'(':~~")
by
(5.23),
and this is positive because the binomial
coefficient has exactly n
-
m- 1 negative factors. The inequality is true when
1
= 0, for the same reason. If
1
> 0, we have
f&,+(x)
=
-Iftpl,m,n+l(~),
which is negative by induction.
522 ANSWERS TO EXERCISES
5.82 Let ~,,(a) be the exponent by which the prime p divides a, and let
m = n
-
k. The identity to be proved reduces to
For brevity let’s write this as
min(x,,yl,zl)
=:
min(xz,y2,z2).
Notice that
x1
+
y,
+
z1
= x2 +
y2
+
22.
The general relation
+(a)
<
e,(b)
=+
e,,(a) =
eP(/u*bl)
allows us to conclude that x.1 # x2
==+
min(x,
,x2) = 0; the same holds also
for
(~1,
y.7)
and (2, ,22). It’s now a simple matter to complete the proof.
5.83 If m < n, the quantity (j:“)
(“‘?:iPk)
is a polynomial in k of degree
less than n, for each fixed
.i;
hence the sum over k is zero. If m 3 n and
if
r
is an integer in the range n <
r
6 m, the quantity
(‘+kk)
(m+c:iPk) is a
polynomial in j of degree less than r, for each fixed k; hence the sum over j is
zero. If m 3 n and if
r
= -d
-
1 is an integer, for 0 6 d < n, we have
(;)
=
(w(qd)
=
(-lIq;)(;);
hence the given sum can be written
pk(i;k)(;)(:)(;I)(m+;lI:-k)
,
,
=
pk(;)
(3
(‘:“)
(jy)
(m+;:;
-“>
=
&,,+m+-l
n
j,kL
(k)(;)(‘:“)(-‘i”l’)(-“m’*,‘)
=
xc-1
)k+mi-L
k,i
(;)
(3
(7”)
(-‘mn12).
This is zero since (I:“) is a polynomial in k of degree d < n.
If m 3 n, we have verified the identity for m different values of r. We
need consider only one more case to prove it in general. Let
r
= 0; then j = 0
and the sum is
pk(;)
(-+;;-
“) =
(3
by (5.25). (Is there a substantially shorter proof?)
A ANSWERS TO EXERCISES
523
5.84 Following the hint, we get
andasimilarformulafor&,(z).
Thustheformulas
(ztB;‘(z)‘B[(z)+l)Bt(z)r
and
(ztE;‘(z)&:(z)
+ l)&,(z)’ give the respective right-hand sides of (5.61).
We must therefore prove that
(zwwJm
+
l)%w-
=
1
_ t
+
:‘%
(z)
q
,
t
(zw4~:M
+ 1)Wz)’ = ,
&Z)t
,
and these follow from (5.59).
5.85 If f(x) =
a,x”
+
...
+ a’x +
a0
is any polynomial of degree < n, we
can prove inductively that
x
c-1
1
“+“‘+‘“f(e1x,
+...+E,x,)
=
(-l)nn!~,I~l
.
..x..
O$f,
,...,
E$,$l
The stated identity is the special case where a, = 1 /n! and
Xk
= k3.
5.86 (a) First expand with n(n- 1) index variables
Lij
for all i # j. Setting
kii = li’
-Lji
for 1
:<
i < j < n and using the constraints
tifi
(lij
-iii)
= 0 for
all i < n allows us to carry out the sums on
li,
for 1 6 j < n and then on
iii
for 1 < i < j < n by Vandermonde’s convolution. (b) f(z)
-
1 is a polynomial
of degree < n that has n roots, so it must be zero. (c) Consider the constant
terms in
,jJsn
(1
--
;)“’
=
g
JIn
(1
-
;)“‘~
(Y
,
,
,
ifi
i#i
5.87 The first term is
t,
(n;k)zmk, by (5.61). The summands in the second
term are
1
-
m
EC
k20
(n+
1)/m;
(l+l/m)k)iiz),.;,
1
= --
m
(‘+‘;~~~;‘-‘)(i,jk.
524 ANSWERS TO EXERCISES
Since ~06j<m(<‘i+‘)k =
m(--l)‘[k=mL],
these terms sum to
XC
(l+l/mW
-n-
1
(-z”)k
mk-n--l
k>n/m
)
=Q
(m+l)k-n-
1
k
k>n/m
)
(-zm)k
=
t
(”
-kmk)pk
k>n/m
Incidentally, the functions ‘B,,,(zm) and L2i+‘z~B1+II,(L2~+‘~)‘~m are the m+l
complex roots of the equation w”‘+’
-
wm = z”l.
5.88 Use the facts that
Jr(e
it
-
e
nt)
dt/t
= Inn and (1
-
ee’)/t
$ 1.
(We have
(“,)
= O(kmx ‘)
a,s
k
+
00, by (5.83); so this bound implies that
Stirling’s series
tk
sk
(i)
converges when
x
> -1. Hermite
[155]
showed that
the sum is In r( 1 + x).)
5.89 Adding this to (5.19) gives
~~‘(x+y)~+’
on both sides, by the binomial
theorem. Differentiation gives
I
sentence
on
the
other side
of this page
is not
self-
referentia/.
and we can replace k by k + m + 1 and apply (5.15) to get
&
(m;::
k)
(-‘I;
‘)
(-X)m+l+ky--l-k-n
In hypergeometric form, this reduces to
which is the special case (a, b, c, z) = (n +
1,
m + 1 +
r,
m + 2, -x/y) of the
reflection law (5.101). (Thus (5.105) is related to reflection and to the formula
in exercise 52.)
5.90 If
r
is a nonnegative integer, the sum is finite, and the derivation in
the text is valid as long as
:none
of the terms of the sum for 0 < k <
r
has
zero in the denominator. Otherwise the sum is infinite, and the kth term
(k
ml-
‘)
/
(
k
-i-l)
is approximately
k”
(-s
-
l)!/(-r
-
l)!
by (5.83). So we
A ANSWERS TO EXERCISES 525
need r > s+ 1 if the infinite series is going to converge. (If r and s are complex,
the condition is
%r
>
’31s
+ 1, because
lkZl
= km’.) The sum is
F
-r, 1
(
I)-
,
_
r(r-s-l)T(-s)
s+l
-S
T(r-s)T(-s-l)
=
s+l-r
by (5.92); this is the same formula we found when r and s were integers.
5.91 (It’s best to use a program like MACSYMA for this.) Incidentally,
when c = (a+
1)/2,
this reduces to an identity that’s equivalent to (5.110), in
view of the Pfaff’s reflection law. For if w =
-z/(
1
-2)
we have 4w( 1
~
w) =
-42/(1
-
z)‘, and
F
ia,
+
a+;-b
1 +a-b
4w(l-w)
=
(l-z)uF(,;;~bir).
5.92 The identities can be proved, as Clausen proved them more than 150
years ago, by showing that both sides satisfy the same differential equation.
One way to write the resulting equations between coefficients of z” is in terms
of binomial coefficients:
(I;)
(3
L’k)
Ck)
F(
r+skl/2)(r+s11/2)
n k
Another way is in terms of hypergeometrics:
F
a,b,
i-a-b-~--n
_
(Za)“(a+b)“(2b)“;
i+a+b,l-a-n,l-b-n
-
(2a+2b)“a”b”
F
$+a,
i
tb,a+b-n,-n
l+a+b,i+a-n,i+b-n
1)
1
=
(1/2)“(1/2+a-b)“(l/2-a+b)”
(1
+a+b)n(l/4-a)K(1/4-b)”
5.93
0~~
n:_,
(f(j) +
cx)/f(j).
(The special case when f is a polynomial of
degree 2 is equivalent to identity (5.133).)
526 ANSWERS TO EXERCISJES
5.94 This is a consequence of Henrici’s “friendly monster” identity,
f(a,z)f(a,wz)f(a,w"z)
F
(
;a-+,
ia++
42
3
=
5a,3a+~i,~a+5,3a-~,fa,~a+~,a
I(
>)
9
'
where f (a, z) =
F(;
a; z). This identity can be proved by showing that both
sides satisfy the same differential equation. If we replace 3n by 3n + 1 or
3n + 2, the given sum is zero.
5.95 See
[78]
for partial
reisults.
The computer experiments were done by
V.
A. Vyssotsky.
5.96 All large n have the property, according to S&k&y [256’]. Paul ErdGs
conjectures that, in fact, ma+,
cp
((2,“)) tends to infinity as n
+
00.
5.97 The congruence surely holds if 2n + 1 is prime. Steven Skiena has also
Ilan Vardi notes
found the example n = 2953, when 2n + 1 =
3.11
.179.
that the condi-
tion holds for
6.1
2314,2431,3241,1342,3124,4132,4213,1423,2143, 3412,4321.
2n+l
=p’,
where p is prime,
6.2
{
E}n-&,
because every such function partitions its domain into k
non-
if and only if
empty subsets, and there are
rnk
ways to assign function values for each
2pm’
mod
p2
= 1.
partition. (Summing over k gives a combinatorial proof of (6.10).)
This yields two
more examples:
6.3 Now
dk+’
6 (center of gravity) --E = 1
-e+(d’
+...+dk)/k.
This
n=
(‘093~-‘)/2;
recurrence is like (6.55) but with 1
-
c
in place of 1; hence the optimum
n =
(35112-1)/2.
solution is
dk+’
= (1
-
c)Hk. This is unbounded as long as
c
< 1.
6.4 Hln+’
-
:H,,.
(Similarly
EC”=,
(-l)kp’/k
= Hz,,
-
H,.)
6.5
U,(x,y)
is equal to
+
ky)n-‘.
The
k31
(;)(-l)kp'(~+ky)n-'
=
This proves (6.75). Let
R,(x,y)
=
x~“U,(x,y);
then
Ro(x,y)
=:
0 and
R,(x,y)
=
R,-'(x,y)
+
l/n+y/x,
hence
R,(x,y)
=
H,+ny/x.
(Incidentally, the original sum
U,
=
U,(n,
-1) doesn’t
lead to a recurrence such as this; therefore the more general sum, which de-
taches x from its
dependenice
on n, is easier to solve inductively than its
special case. This is another instructive example where a strong induction
hypothesis makes the difference between success and failure.)
The Fibonacci
re-
currence is additive,
6.6 Each pair of babies bb present at the end of a month becomes a pair
but
the
rabbits
are
of adults
aa
at the end of the next month; and each pair
aa
becomes an
multiplying.
A ANSWERS TO EXERCISES 527
If the harmonic
aa
and a
bb.
Thus each bb behaves like a drone in the bee tree and each
aa
behaves like a queen, except that the bee tree goes backward in time while
the rabbits are going forward. There are F,+l pairs of rabbits after n months;
F,
of them are adults and
F,-,
are babies. (This is the context in which
Fibonacci originally introduced his numbers.)
numbers are
worm
numbers, the Fi-
6.7 (a) Set k = 1 -- n and apply (6.107). (b) Set m = 1 and k = n- 1 and
bonacci numbers
are rabbit numbers.
apply (6.128).
6.8 55 + 8 + 2 becomes 89 + 13 + 3 = 105; the true value is 104.607361.
6.9 21. (We go from
F,
to F,+z when the units are squared. The true
answer is about 20.72.)
6.10 The partial quotients
a~,
al, az, . . . are all equal to 1, because
C$
=
1 + 1
/c$.
(The Stern-Brocot representation is therefore RLRLRLRLRL.. .
.)
6.11
(-1)” = [n=O]
-
[n=l];
see (6.11).
6.12
This is a consequence of (6.31) and its dual in Table 250
6.13 The two formulas are equivalent, by exercise 12. We can use induction.
Or we can observe that znDn applied to f(z) =
zx
gives
xnzX
while 9” applied
to the same function gives xnzX; therefore the sequence (a’, 4’
,a2,.
. . ) must
relate to (z”Do,z’D’, z2D2,. . . ) as (x0,
x1,x2,.
) relates to (x”, x1,
x2,.
.
.).
6.14 We have
x(“i”)
=
(k+l)(~~~)
+In-k)(x~~~l),
because (n+l)x= (k+l)(x+k-n)+(n-k)(x+k+l). (It suffices toverify
the latter identity when k = 0, k = -1, and k = n.)
6.15 Since A((‘Ak)) =
(iTi),
we have the general formula
= A”(x”) =
1
j
0
y
(-l)mpi(x
+ j)”
Set x = 0 and appeal to (6.19).
6.16
An,k
=
tj>o
oj
{
“i’};
this sum is always finite.
6.17
(a) [;I = [l:T.!,]. (b) /:I =
n*
= n!
[n3
k]/k!. (c)
IL/
=
k!(z).
6.18 This is equivalent to (6.3) or (6.8). (It follows in particular that
o,(l) =
-na,(O)
= U&n! when n > 1.)
6.19 Use Table 258.
6’20
xl<j<k<n
l/j2 =
t,4jsn(n+
1
-
j)/j’
= (n + l)Hp)
-
H,.
528 ANSWERS TO EXERCISES
6.21 The hinted number is a sum of fractions with odd denominators, so
it has the form a/b with a and b odd. (Incidentally, Bertrand’s postulate
implies that b, is also divisible by at least one odd prime, whenever n > 2.)
6.22
Iz/k(k
+ z)I <
2/21/k;’
h
w
en k >
21~1,
so the sum is well defined when
the denominators are not zero. If z = n we have I:=:=, (l/k
-
l/(k
+ n)) =
Hm
-
Hm+n
+
H,,
which approaches
H,
as m -3
co.
(The quantity HZ-r
-
y
is often called the psi function Q(z).)
6.23
z/(e’+l)
=z/(e’-
I)-2z/(e2’-1)
=tRaO(l
-2n)B,zn/n!.
6.24 When n is odd, T,,(x) is a polynomial in x2, hence its coefficients
are multiplied by even num.bers when we form the derivative and compute
T,+l (x) by (6.95). (In fact we can prove more: The Bernoulli number
B2,,
always has 2 to the first power in its denominator, by exercise 54; hence
22n
k
\\Tl,,+r
w
2k\\(n
i- 1). The odd positive integers (n + 1
)TJ,+~
/22n
are called Genocchi numbers (1,
1,3,17,155,2073,.
. . ), after Genocchi [117].)
6.25
lOOn-nH,
< lOO(n- 1)
-
(n-
l)H,-1
w
H,-l
> 99. (The least
such n is approximately
e99m~Y,
while he finishes at N =
eloom
Y,
about
e
times
as long. So he is getting closer during the final 63% of his journey.)
6.26 Let u(k) =
HkP1
and Av(k) = l/k, so that u(k) = v(k). Then we have
S,
-
Hi2’
=
I;=,
Hkp,/k
=-
Hip, I;+’
-
5,
=
HL,
-
5,.
6.27 Observe that when
T~I
> n we have
gcd(F,,F,)
= gcd(F,
,,F,)
by
(6.108). This yields a proof by induction.
6.28 (a) Q,, = ol(L, ~ F,,)/2 +
fiFn.
(The solution can also be written
Q,, =
cxF,
1 + BF,.) (b)
L,
=
+”
+
$“.
6.29 When k = 0 the identity is (6.133). When k =
1
it is, essentially,
K(xI,.
.
.
,
x,)x, = K(x,, . . .
,x,)
K(x,,
. . .
,x,)
-
K(x,, . . . ,
~m~~)K(xm+z,...,xn);
in Morse code terms, the second product on the right subtracts out the cases
where the first product has intersecting dashes. When k > 1, an induction
on k suffices, using both (6.127) and (6.132). (The identity is also true when
one or more of the subscripts on K become -1, if we adopt the convention that
K 1 = 0. When multiplication is not commutative, Euler’s identity remains
valid if we write it in the form
K,+.,(xl,...,x,+,iKk(X,+k,...,Xm+l)
=
Km+k(Xl,
. . .
rX,+k)K,(x,+,,...,xmi-1)
+(-l)kK,~,(~,,...,~,~l)K,~k~l(X,+,,..
. ,
%n+k+Z 1.
A ANSWERS TO EXERCISES 529
For example, we obtain the somewhat surprising noncommutative factoriza-
tions
(abc+a+c)(l
+ba)
=
(ab+l)(cba+a+c)
from the case k = 2, m = 0, n = 3.)
6.30 The derivative of K(xl , . . .
,x,)
with respect to xm is
K(x,,...
,xm-l)K(xm+l,...,xn),
and the second derivative is zero; hence the answer is
K(x,, . . .
,x,1
+K(xl,...,xm~1)K(x,+l,...,x,)~
6.31 Since xK =
(L)(n-
l)“k.
(x
+ n
-
1)s =
tk
(L)x”(n
-
I)*,
we have
]z]
=
Th
ese
coefficients, incidentally, satisfy the recurrence
=
(n-l+k)/nkl~+~~~:/,
integersn,k>O.
6.32
Ek,,k{nlk}
= {“+,“+‘} and
&k$,,
{i}(m+l)”
k
= {G’,:}, both
of which appear in Table 251.
6.33 If n > 0, we have
[‘;I
=
i(n-
l)!
(Ht~
,
-
Hf-I,), by (6.71); {;} =
i(3”
-
3.2” + 3), by (6.19).
6.34 We have
(i’)
=
l/(k+
l),
(-,‘) =
Hr!,,
and in general
(z)
is given
by (6.38) for all integers n.
6.35 Let n be the least integer > l/e such that
[HnJ
>
[H,
-,J.
6.36 Now dk+, =
(lOOf(l
+dl)+...+(l+dk))/(lOO+k),
and the solution
is dk+l =
Hk+100
-
Hlol + 1 for k 3 1. This exceeds 2 when k 3 176.
6.37 The sum (by parts) is
H,,
-
(z
+
2
+ . . . +
$-)
=
H,,
-
H,. The
infinite sum is therefore lnm. (It follows that
x
ym(k)
~
:=
k>,
k(k+
1)
mlnm,
m-l
because v,(k) = (m- 1)
xi,,
(k mod
mj)/mj.)
6.38
(-l)‘((“,‘)r-’
-
(1:
])Hk) + C. (By parts, using (5.16).)
6.39 Write it as
x,sjsn
jj’
xjbksn
Hk
and sum first on k via (6.67), to get
(n+l)HE-(2n+l)H,+2n.
530 ANSWERS TO EXERCISIES
6.40 If 6n
-
1 is prime, the numerator of
4n
(-‘)k-
tjy-=
H
4n
1
-Hzn
I
k=l
is divisible by 6n
-
1, because the sum is
Similarly if 6n + 1 is prime, the numerator of
x”,E,
(-1
)km
‘/k =
Han
~ Hln
is a multiple of 6n + 1. For 1987 we sum up to k = 1324.
6.41
‘&+I
=
tk
(Lin+‘k+kl’LJ)
=
tk
(L’“~k~‘2J),
hence we have Sn+l +
S,
=
xk(
11*-i
kL’/2+11)
=
Sn+2.
The answer is F,+z.
6.42 F,,.
6.43 Set
z
=
$
in Ena0
F,z"
=
z/(1
-
z
-
z2)
to get
g.
The sum is a
repeating decimal with period length 44:
0.1123595505617977ti28089887640449438202247191011235955+
6.44 Replace (m, k) by
(--m,
-k) or (k, -m) or (--k, m), if necessary, so
that m 3 k 3 0. The result is clear if m = k. If m > k, we can replace (m, k)
by (m
-
k, m) and use induction.
6.45 X, = A(n)oc+B(n)fi-tC(n)-y+D(n)&, where B(n) = F,, A(n) =
F,
1,
A(n) + B(n)
-
D(n) = 1, and B(n)
-
C(n) + 3D(n) = n.
6.46
$/2
and
@
-l/2.
Let
LL
= cos 72” and v = cos 36”; then u =
2v2
-
1 and
v =
1-2sin'
18” =
1-2~‘.
Ijence
u+v
=
Z(u+v)(v-u),
and
4v2-2v-1
= 0.
We can pursue this investigation to find the five complex fifth roots of unity:
1,
Q-1
f
i&Gfl
-Q
f
i&q
2
2
6.47
2Q5
F,
=
(1
+
&)”
-
(1
-
&)n,
and the even powers of
fi
cancel
"Let p be
any old
out. Now let p be an odd prime. Then (2kF,) = 0 except when k = (p
-
1)/2,
prime.”
;;d
(f,,!,)
=
0 except when k = 0 or k = (p
-
1)/2; hence
F,
E
5(p
‘)/’ and
(See
j140/,
p.
419.)
p+l
E
1
+5(Pm')/'
(mod p). It
can be shown that
5(PP’)/2
E 1 when
p
has
the form ‘Ok
&
1,
and
5(P
:/2
E -1 when p has the form ‘Ok
f
3.
6.48 This must be true because (6.138) is a polynomial identity and we can
set a, = 0.
A ANSWERS TO EXERCISES 531
6.49 Set
z
=
i
in (6.146); the partial quotients are 0,
2F3,
2F1,
2Fl, . .
(Knuth
[172]
noted that this number is transcendental.)
6.50 (a) f(n) is even
tl
3\n. (b) If the binary representation of n is
(la'o"z...
lam
‘o”m )2,
where m is even, we have f(n) =
K(a’,az,.
,a, ‘).
6.51 (a) Combinatorial proof: The arrangements of
{l
,2,.
. , p} into k sub-
sets or cycles are divided into “orbits” of 1 or p arrangements each, if we
add 1 to each element modulo p. For example,
{1,2,4I'J{3,51
+
{2,3,5IuI4,11
+
13,4,lIu{5,21
--f
14,5,2P~u,31
+
15,1,31w2,41
+
11,2,41~{3,51.
We get an orbit of size 1 only when this transformation takes an arrangement
into itself; but then k = 1 or k =
p.
Alternatively, there’s an algebraic proof:
WehavexP-xfl-+xLandxE=xP-
x (mod p), since Fermat’s theorem tells
us that
x”
-x
is divisible by
(x-0)(x
-
1). . . (x
-
(p-l)).
(b) This result follows from (a) and Wilson’s theorem; or we can use
xp-’ E
xF/(x-1)
-
l(XP
-x)/(x- 1)
=xp
'
+xp-2 + .
..+x.
(c) We have
{“l’}
E [“:‘I
z
0 for 3 6 k 6
p,
then {“12} E [pl’]
G
0
for 4 < k < p, etc. (Similarly, we have
[‘PpP’]
s
-{2ppm’}
E 1.)
(d) p!
=pP=
tk(-l)pPkpk[;]
=pP[~]-pP~'[pp,]+...+p3[~]~
P'[;]
+P[$
But
P[:]
=
P!,
so
[I]
=
P[I]
--P$]
+...+ppyP]
is a multiple of
p2.
(This is called Wolstenholme’s theorem.)
6.52 (a) Observe that
H,
= Hk + HlnlP~/p, where Hz =
xc=,(k
I
p)/k.
(b) Working mod 5 we have H, = (0,
1,4,1,0)
for 0 <
r
< 4. Thus the first
solution is n = 4. By part (a) we know that
5\a,
=$
5\ajn,sl;
so the next
possible range is n
==
20 + r, 0 <
r
6 4, when we have
H,
=
Ht
+
&H4
=
H;e+~Hq+H,+~~=,
20/k(20+k).
The numerator of
H;,,
like the numerator
of HJ, is divisible by 25. Hence the only solutions in this range are n = 20
and n = 24. The next possible range is n = 100
+
r;
now
H,
=
H;
+
&Hzo,
which is ;Hzo + H, plus a fraction whose numerator is a multiple of 5. If
$Hzo
%
m (mod 5), where m is an integer, the harmonic number
H’~o+~
will
have a numerator divisible by 5 if and only if m + H,
e
0 (mod 5); hence
m must be E 0, 1, or 4. Working modulo 5 we find $Hls =
&Hz0
+ &H4 3
-&Hh
= & E 3; hence there are no solutions for 100 6 n < 104. Similarly
there are none for 120 < n 6 124; we have found all three solutions.
(By exercise 6.51(d), we always have
p2\apP,,
p\apzPp,
and
p\a,r-,,
if p is any prime 3 5. The argument just given shows that these are the only
532 ANSWERS TO EXERCISES
solutions to
p\a,
if and only if there are no solutions to p
‘H,-l
+ H, E 0
(Attention, com-
(mod p) for 0 <
r
< p. The latter condition holds not only for p = 5 but
Puter
Programmers:
also for p = 13, 17, 23, 41, and 67-perhaps for infinitely many primes. The
Here’s an interest-
numerator of H,, is divisible by 3 only when n = 2, 7, and 22; it is divisible
ing condition to
test, for as many
by 7 only when n = 6, 42, 48, 295, 299, 337, 341, 2096, 2390, 14675, 16731,
Primes
asYou
can.)
16735, and 102728.)
6.53 Summation by parts yields
$$$(s((“-ZIHm+j
-1)-l).
6.54 (a) If m 3 p we have S,(p) E
S,m~,,P,,(p)
(mod p), since
kP
= 1
when1
<k<p.
Also.&-l(p)=p-1
~-1.
IfO<m<p-1,wecanwrite
(b) The condition in the
h.int
implies that the denominator of Iln is not
divisible by any prime p;
he:nce
II,, must be an integer. To prove the hint, we
may assume that n > 1. Then
B2n
+
[(p-l
)\Vni]
+
‘E2
P
k=O
is an integer, by (6.78), (6.84), and part (a). So we want to verify that none
of the fractions (‘“,”
)Bkp2"
k/(2n + 1) = (2~)Bkp2”~k/(2n
-
k + 1) has a
denominator divisible by p. The denominator of (‘p)Bkp isn’t divisible by p,
since
Bk
has no
p2
in its denominator (by induction); and the denominator
of
p2"
km
‘/(2n
-
k + 1) isn’t divisible by p, since 2n
-
k + 1 <
p'"
k
when
k 6
2n-2;
QED. (The numbers
1~~
are tabulated in
[185].
Hermite calculated
them through
1~s
in 1875
[153].
It turns out that
12
=
14
=
16
= Is =
110
=
112
= 1; hence there is actually a “simple” pattern to the Bernoulli
numbers displayed in the text, including
-$$(!).
But the numbers
I?,,
don’t
seem to have any memorable features when n > 6. For example,
BJ~
=
-86579
-
i
-
f
-
&
-
f
-
if,
and 86579 is prime.)
(c) The numbers
2-
1 and
3-
1 always divide 2n. If n is prime, the only
divisors of 2n are
1,
2, n, and 2n, so the denominator of
BJ,,
for prime n > 2
will be 6 unless
2n+l
is also prime. In the latter case we can try
4n+3,
8n+7,
. . . ,
until we eventually hit a nonprime (since n divides 2”
‘n
+ 2”
-
1).
(This proof does not need the more difficult, but true, theorem that there are
infinitely many primes of the form 6k + 1.) The denominator of
BJ,,
can be 6
also when n has nonprime values, such as 49.
(The numerators of
Bernoulli numbers
have important
connections to
the known results
about Fermat’s
Last Theorem; see
Ribenboim j249j.j
A ANSWERS TO EXERCISES 533
6.55 The stated sum is
*(Xzn)(mn’;l)~
by Vandermonde’s convolution.
To get
(6.70)~
differentiate and set x = 0.
6.56 First replace k”+’ by ((k
-
m) + m) n + 1 and expand in powers of
k
-
m; simplifications occur as in the derivation of (6.72). If m > n or
m < 0, the answer is (-1 )“n!
-
m”/(“*“‘). Otherwise we need to take the
limit of (5.41) minus the term for k = m, as x
+
-m;
the answer comes to
(-l)nn!+(-l)m+l(~)mn(n+l
+mH,,
m-mHm).
6.57 First prove by induction that the nth row contains at most three
distinct values A,,
13
B, 3
C,;
if n is even they occur in the cyclic or-
der
[C,,,B,,A,,B,,C,],
while if n is odd they occur in the cyclic order
[C,,B,,,A,,A,,B,I.
Also
A2,+1
=
A2n
+
Bzn;
ALn
=
2A2n
I;
B2n+1
=
B2n
+Cz,;
Bin =
A2n
I
+Bzn
I;
CZn+l
=
2c>,;
C
2n
=
B
2n
1+C2n-1.
It follows that Q,,
==
A,
-
C,
=
F,+,.
(See exercise 5.75 for wraparound
binomial coefficients of order 3.)
6.58 (a)
x
n>OF;z'L
~(1
-z)/(l
+z)(l
-3z+t’)
=
;((2-3z)/(l
-32+
z2)
-2/(1
+zij.
(b)
1
naOF;t~n=z(l-2z-z2)/(1-4z-z2)(1+z-z2)=
;(2z/(l
-42-z2)+32/(1
+z-2’)).
(These formulas are obtained by squaring
or cubing Binet’s formula (6.123) and summing on n, then combining terms
so that
@
and $ disappear.) It follows that
Fi,,
-
4Ff,
-
FA
,
=
3(-l)nF,.
(The corresponding recurrence for mth powers has been found by Jarden and
Motakin [163].)
6.59 Let m be fixed. We can prove by induction on n that it is, in fact,
possible to find such an x with the additional condition x $ 2 (mod 4). If x
is such a solution, we can move up to a solution modulo 3”+’ because
F8.3r,m,
G
3".
F8.3rIm~
,
G
3n+1
(mod
3”+‘);
either x or x +
8.3”
or
x
+ 16.3”
will do the job
6.60
F1
+
1,
F2
+ 1,
F3
+ 1,
FJ
-
1,
and
F6
-
1 are the only cases. Otherwise
the Lucas numbers of exercise 28 arise in the factorizations
F2m+(-lIm
= L,+,F,
I;
F2m+1+(-lIm
= LnF,,,;
F2m-(-lIm
= L,
IF,+,;
F2m+1-(-lim
=
L+,F,.
(We have F,,,
-
(-l)nF,p,
=
L,F,
in general.)
6.61
1/F2,
= F, ,
/F,
-
FL,,-, /Fz,,, when m is even and positive. The
second sum is 5/4
-
F3.pp~/F3.21t,
for n 3 1.
534 ANSWERS TO EXERCISES
6.62 (a) A,, =
&?A,-,
--
A,-2
and B, =
&B,
1
-
B,~~2.
Incidentally,
we also have
&A,
+
B,
= 2A,+, and
fig,,
-
A,, =
2B,
1.
(b) A table of
small values reveals that
\/5F,,
n
even;
L
n,
n odd.
(cl
WA,+1
-
B,-l/A,
= l/(Frn+l + 1) because
B,A,
-
B, rA,+l =
&
and
A,A,+l
=
&(Fz,+l
+ 1). Notice that
B,/A,+l
=
(F,/F,+r)[n
even] +
(L,/L,+li[n
odd].
(d)
Similarly, xi=,
1/(F2kmkl
-
1)
= (Ao/BI
-
Al/B2) +
.. + (Anm~l/B,
-
A,/B,+j
) = 2
-
A,/B,+,
This quantity can also be
expressed as
(5F,/L,+l)
[n even] +
(L,/F,+,
) [n odd].
6.63 (a) [z]. There are [;‘-:I with n, = n and (n
-
l)[nk’] with n, < n.
(b)
(i).
Each permutation
pr
. .
on
-1
of 11,. . , n-
1)
leads to n permutations
n17t2..
.n,
=
p1
pj
1 n pj+l . . . on-l pi. If
p1
.
pn
1 has k excedances,
there are k+ 1 values of j that yield k excedances in
7~17~2
. . .
n,;
the remaining
n- 1~ k values yield k+ 1.
I-lence
the total number of ways to get k excedances
innln2...nnis(k+l)(“,‘)+((n-l)-(k-l))(z:;)=(t).
6.64 The denominator of (l’,‘) is 24nPvJini, by the proof in exercise 5.72.
The denominator of [
,I:‘,]
is the same, by (6.44), because
((i))
= 1 and
KS)
is even for k > 0.
6.65 This is equivalent to saying that (L)/n! is the probability that we
have
1x1
+ +
xnJ
= k, when
xl,
,
x,
are independent random numbers
uniformly distributed between 0 and 1. Let
yj
=
(x1
+ . . . +
Xj
) mod
1.
Then
Yl,
f..,
y,, are independently and uniformly distributed, and
1x1
+.
. . +
x,J
is the number of descents in the y’s. The permutation of the y’s is random,
and the probability of k descents is the same as the probability of k ascents.
6.66 We have the general formula
((;))
=
E
(2nr’){n~~:!iik}(-ll”.
for n > m > 0,
analogous to (6.38). When m = 2 this equals
((I)) =
{n:3}~-(2n+l){n~2}+
(‘“:‘){“:‘}
=
;3n+2
-
(2n + 3)2n+’
+i(4nz+6n+3).
6.67
~n(n+~)(n+l)(2H2n-H,)-&n(10n2+9n-1).
(It wouldbenice
to automate the derivation of formulas such as this.)
A ANSWERS TO EXERCISES 535
6.68 1 /k
-
1
/(k
+ z) = z/k2
-
z2/k3 + , and everything converges when
121
< 1.
6.69 Note that
nL=,
(1 + z/k)ePLlk =
(nnfL)nPLe(lnn
Hlllr.
If f(z) = &(z!)
we find
f(z)/z!
+ y q = H,.
6.70 For tan z, we can use tan
z
= cot
z
-
2 cot 22 (which is equivalent to the
identity of exercise 23). Also z/sin
z
= zcot
z
+ ztan
:Z
has the power series
&o(-l)“P’(4n
-
2)Bznz2”/(2n)!; and
tan2
In-
=In
-
-“”
lncosz
z
4n(4n-l)B2,~2n
(2n)(2n)!
4*(4" -2)BLnz2"
=
IL-‘)”
(2n)(2n)!
n3
1
because
-&
In sin
z
= cot
z
and
-&
In cos
z
= -tan z.
6.71 Since tan2z
-
sec22 = (sin2 + cosz)/(cosz
-
sinz), setting x = 1 in
(6.94)
gives
T, (1) =
2nT,,
when n is odd,
T,,
(1) = 2"E,
when n is even, where
1
/cos
2 =
tn30
Ezn,
-‘“/(2n)’ (The
E,
are called Euler numbers, not to be
confused with the Eulerian numbers (L)
.)
6.72
2n+1(2n+'
-
l)B
,+i/(n
+
l),
if n > 0. (See (7.56) and (6.92); the
desired numbers are essentially the coefficients of 1
-
tanhz.)
6.73 cot(z +
rr)
= cot
z
and cot(z +
irr)
= -tan z; hence the identity is
equivalent to
cot
z
=
-
in
2gl
cot
3&z
)
k=O
which follows by induction from the case n = 1. The stated limit follows since
zcot
z
+
1 as
z
-+ 0. It can be shown that term-by-term passage to the limit
is justified, hence (6.88) is valid. (Incidentally, the general formula
cot
z
=
-
;
rcot
q?
k=O
is also true. It can be proved from (6.88), or from
1
~
=
en2
-
1
k=O
which is equivalent to the partial fraction expansion of
l/(2”
-
l).)
536 ANSWERS TO EXERCISES
6.74 If p(x) is any polynomial of degree 6 n, we have
because this equation holds for x = 0, -1, . . . , -n. The stated identity is
the special case where p(x) =
xo,
(x) and x = 1. Incidentally, we obtain
a simpler expression for Bernoulli numbers in terms of Stirling numbers by
setting k = 1 in (6.99):
j-l)‘&
=
B,
6.75 Sam Loyd [204, pages 288 and
3781
gave
the construction
and claimed to have invented (but not published) the 64 = 65 arrangement
in 1858. (Similar paradoxes go back at least to the eighteenth century, but
Loyd found better ways to present them.)
6.76 We expect
A,/A,-1
M
c$,
so we try
A,-,
= 618034 +
T
and A,,-2 =
381966-r. Then A,
3
=236068+2r, etc., and we find Am-is = 144-2584r,
A,,-19 = 154 +4181r. Hence
r
= 0, x = 154, y = 144, m = 20.
6.77 If
P(F,+j,
F,) = 0 for infinitely many
even.
values of n, then P(x,y) is
divisible by U(x,y)
-
1, where U(x,y) = x2
-
xy
-
y2. For if t is the total
degree of P, we can write
P(XjYI
=
~4kXkY’pk
+
t
rj,kXjyk
=
Q(x,Y)
+R(x,y)
k=O
j+k<t
Then
P(Fn+l,Fn)
Fh
A ANSWERS TO EXERCISES 537
and we have xk=,
qk+k
= 0 by taking the limit as n
+
03.
Hence Q(x,y) is
a multiple of U(x,y), say A(x,y)U(x,y). But U(F,+,,F,) = (-1)” and n is
even, so
Pc(x,y)
= P(x,y)
-
(U(x,y)
-
l)A(x,y) is another polynomial such
that
Po(F,+,
, F,) = 0. The total degree of
PO
is less than t, so
PO
is a multiple
of U
-
1 by induction on t.
Similarly, P(x,y) is divisible by U(x,y) + 1 if
P(F,+,
,F,)
= 0 for in-
finitely many odd values of n. A combination of these two facts gives the
desired necessary and sufficient condition:
P(x,
y) is divisible by
U(x,
y)’
-
1.
6.78 First add the digits without carrying, getting digits 0, 1, and 2. Then
use the two carry rules
O(d+l)(e+l)
+
lde,
O(df2)Oe
--t
ldO(e+l),
always applying the leftmost applicable carry. This process terminates be-
cause the binary value obtained by reading (b, . . .
bZ)F
as (b, . . .
b2)2
in-
creases whenever a carry is performed. But a carry might propagate to the
right of the “Fibonacci point”; for example,
(1
)~+(l
)F
becomes
(10.01)~.
Such
rightward propagation extends at most two positions; and those two digit po-
sitions can be zeroed again by using the text’s “add
1"
algorithm if necessary.
Incidentally, there’s a corresponding “multiplication” operation on
nonnegative integers: If m =
Fj,
+. . . +
Fjq
and n =
Fk,
+. 3 . +
Fk,
in the
Fibo-
nacci number system, let m
o
n = xz=, EL=,
Fjb+k,,
by analogy with
mul-
Exercise: m
0
n =
tiplication of binary numbers. (This definition implies that m o n
z
x/!?
mn
mn+ when m and n are large, although 1
o
n
M
+*n.) Fibonacci addition leads to
l(m+l)/@Jn+ a
proof of the associative law
1
o (m o n) =
(L
o m) o n,)
ml(n+l
)/@I
.
6.79
Yes; for example, we can take
A0
= 331635635998274737472200656430763;
A, = 1510028911088401971189590305498785 .
The resulting sequence has the property that A,, is divisible by (but unequal
to)
pk
when n mod
mk
=
rk,
where the numbers
(pk,
mk,rk) have the follow-
ing 18 respective values:
(3,4,1) (2,3,2) (5,5>1)
(7,833) (17,9,4)
(11,10,2)
(47,16,71
(19,18,10)
(61,15,3)
(2207,32,15)
(53,27,16) (31,30,24)
(1087,64,31)
(109,27,7) (41,20,10)
(4481,64,63)
(5779,54,52) (2521,60,60)
538 ANSWERS TO EXERCISIES
One of these triples applies to every integer n; for example, the six triples in
the first column cover every odd value of n, and the middle column covers all
even n that are not divisible by 6. The remainder of the proof is based on
the fact that A,,,+,, =
A,F,
1 +
A,+,F,,
together with the congruences
for each of the triples (pk, mk, rk). (An improved solution, in which
A0
and
Al
are numbers of “only” 17 digits each, is also possible [184].)
6.80 The matrix product is
(
K,
zIxz,...,x,
I)
Kn-1(~2,...,~,
1,x,)
)
Kn
I(xI,xL,...,x,-1.1
K~(x~,xz,...,x,~~,x,I
.
This relates to products of L and R as in (6.137), because we have
The determinant is K, (x1, , x,); the more general tridiagonal determinant
det
x1
1
0
. . .
0
Yr
x2
1
0
0 Y3 x3
1 :
. .
1
0 0
. . .
yn
x,
satisfies tl
te
recurrence
D,
=
x,D,
1
-
ynD,-2.
6.81 Let
0~~~’
=
a0
+ 1
/(al
+
l/(
a2
+
))
be the continued fraction repre-
sentation of
OL
‘. Then we have
aO+
1
l-z
Z
Ao(z)
+
1
=
z
t
.lnaJ
)
TL31
AI
(~1
+
1
A2(z)
+
/-
where
Am(z) =
Z~4m.
I
_
z--q”,-l
z-qm--]
4
m=
L(al,...,a,).
A proof analogous to the text’s proof of (6.146) uses a generalization of
Zeck-
endorf’s theorem (F’raenkel
[104,
$41).
If z = l/b, where b is an integer 3 2,
A ANSWERS TO EXERCISES
539
this gives the continued fraction representation of the transcendental number
(b
-
‘1
tn3,
bP
Lnaj,
as in exercise 49.
6.82 The sequences of exercise 62 satisfy A-,,, = A,,,,
B-~,,,
=
-B,,
and
A,&
=
Am+,
+A,-,;
A,&
=
Bm+n
-
‘L-n
;
BmB,
=
Am+,
-Am-n.
Let fk = Bmk/Amk+l and
9k
= Amk/Bmk+,, where
1
=
i(n
-
m). Then
fkt-1
-
fk =
A~bn/(A2,k+,
+
AmI
and
gk
~
gk+l
=
A~B,/(Azmk+n
-
A,);
hence we have
2
=
~
-
s;,,
.
FthLm
6.83 Let p = K(0,
al,
a2,. . . , a,), so that p/n is the mth convergent to the
continued fraction. Then
cx
= p/n + (-1 )“‘/nq, where q = K(a, , . , a,,,, 6)
and
fi
> 1. The points
{km}
for 0 6 k
<:
n can therefore be written
0 1
n’
_
I
(-1)"~1
n-l +
(-l)mn,P1
-,
. . . .
-
n
w
n
nq
where
~1
. .
X,-I
is a permutation of {1 , . . , n
-
l}. Let f(v) be the number
of such points < v;
t,hen
f(v) and vn both increase by 1 when v increases from
k/n to (k + 1 )/n, except when k = 0 or k = n
-
1,
so they never differ by 2
or more.
6.84 By (6.139) and (6.136), we want to maximize
K(al,.
. . , a,,,) over all
sequences of positive integers whose sum is < n + 1. The maximum occurs
when all the a’s are 1, for if j 3 1 and a 3 1 we have
Kj+k+l(‘,...,
l,a+l,bl,...,
bk)
= Ki+lC+l(l
,...,
l,a,bl,...,
bk)+Kj(l,...,l)Kk(bl,...,bk)
<
Kj+k+l(l,...,l,a,bl,...,
bk)+Kj+k(l,...,
l,a,bl,...,
bk)
=
Kj+k+I(l,...,
l,o,bl,...,
bk).
(Motzkin and Straus
[220]
solve more general maximization problems on
con-
tinuants.)
540 ANSWERS TO EXERCISES
6.85 The property holds if and only if N has one of the seven forms 5k,
2.5k, 4.5k, 3j.5k, 6.5k, 7.5k, 14.5k.
6.86 A candidate for the case n mod 1 =
$
appears in
[179,
section
61,
although it may be best to multiply the integers discussed there by some
constant involving
fi.
6.87 (a) If there are only finitely many solutions, it is natural to conjec-
ture that the same holds for all primes. (b) The behavior of b, is quite
strange: We have b, = lcm( 1,. . . , n) for 968 6 n 6 1066; on the other hand,
Another reason to
b600
=kIIl(l,...
, 600)/(33 .52 .43). Andrew Odlyzko observes that p divides
remember
1066?
lcm( 1,. . .
,n)/b,
if and only if
kpm
6 n < (k +
1)~“’
for some m 3 1 and
some k < p such that p divides the numerator of Hk. Therefore infinitely
many such n exist if it can be shown, for example, that almost all primes
have only one such value of k (namely k = p
-
1).
6.88 (Brent
[33]
found the surprisingly large partial quotient 1568705 in
ey,
but this seems to be just a coincidence. For example, Gosper has found even
larger partial quotients in
rr:
The 453,294th is 12996958 and the 11,504,93lst
is 878783625.)
6.89 Consider the generating function tm,nZO ]“‘~“(w”‘z~, which has the
form ,Yn(wF(a,b,c)
+zF(a’,b’,~‘))~,
where F( a, b, c) is the differential op-
erator a +
b4,
+ ~4,.
7.1
Substitute
z4
for
0
and
z
for o in the generating function, getting
1 /( 1
-
z4
-
2’). This is like the generating function for T, but with
z
replaced
by 2’. Therefore the answer is zero if m is odd, otherwise Fm,2+l.
7.2 G(z) =
l/(1
-
22) +
l/(1
-
32); G(z) = ezr +
e3=.
7.3 Set
z
=
l/10
in the generating function, getting
$
In
y.
7.4 Divide P(z) by Q(z), getting a quotient T(z) and a remainder
PO(Z)
whose degree is less than the degree of Q. The coefficients of T(z) must be
added to the coefficients
[z”]
Po(z)/Q(z) for small n. (This is the polynomial
T(z) in (7.28).)
7.5 This is the convolution of ( 1 + z’)~ with ( 1 +
z)~,
so
S(z) = (1
+z+z’+z3)‘.
Incidentally, no simple form is known for the coefficients of this generating
function; hence the stated sum probably has no simple closed form. (We can
use generating functions to obtain negative results as well as positive ones.)
A ANSWERS TO EXERCISES 541
I bet that the con-
troversial “fan of
7.6 Let the solution to
go
=
LX,
gl
=
fi,
gn
=
g,,
I
+
29,
2
+
(-1)“~
be
4
n=
A(n)& +
B(n)13
+ C(n)y. The function 2” works when
01
= 1,
/3
= 2,
y = 0; the function
(-1)”
works when
LX
=
1,
fi
=
-1,
y = 0; the function
(-1)“n
works when
01
= 0,
6
= -1, y = 3. Hence A(n) + 2B(n)
= 2”,
A(n) -B(n) =
(-l)“,
and -B(n) + 3Cln) =
(-l)%.
7.7 G(z) = (z/(1
--z)‘)G(z)
+ 1, hence
G(z) =
1 ~ 22
z2
z
-- +
=l+
;
1
-32+22
1
-
32 +
22
order zero” does
have one spanning
we have
gn
=
Fzn
+ ‘n=Oj.
tree.
7.8 Differentiate (1
-
z)
-x-l
twice with respect to x, obtaining
((H,,,
-
H,)’
-
(H$
~
HL2’))
Now set x = m.
7.9 (n +
l)(Hi
-
Hi2))
-
2n(H,
~ 1).
7.10 The identity
Hkm,,2-HP,,Z
=
&
+...+
f
=
2Hlk-Hk
implies that
tk
(‘;)
(‘;
5
(2H2k
-
Hk) = 4nH,.
7.11 (a) C(z) =
A(z)B(z’)/(l
-
z). (b) zB’(z) = A(Zz)e’, hence A(z) =
$e
‘l’B’($).
(c) A(z) =
B(z)/(l
-z)‘+‘,
hence B(z) = (1
-z)‘+‘A(z)
and we
have fk(r) =
(‘l’)(-l)k.
7.12 C,. The nunibers in the upper row correspond to the positions of +1’s
in a sequence of
+l
‘s
and -1
‘s
that defines a “mountain range”; the numbers
in the lower row correspond to the positions of -1’s. For example, the given
array corresponds to
7.13 Extend the sequence periodically (let
x,+k
=
Xk)
and define s, =
x1
f...
+x,.
We have
s,
=
1,
~2~
= 21, etc. There must be a largest index
ki such that
Sk,
= j,
Sk,+,,,
=
1+
j, etc. These indices
kl,
. . . , kl (modulo m)
specify the cyclic shifts in question.
For example, in the sequence
(-2,1,
-1
,O,
1,
1,
-1,
1,
1,l)
with m = 10
and1=2wehavek,
=17,k2=24.
7.14 6 (z) =
-2zG(z)
+
e(z)2
+
z
(be careful about the final term!) leads
via the quadratic formula to
l+Zz-VTT-Q
G(z)
=
~
2
542 ANSWERS TO EXERCISES
Hence
gzn+l
= 0 and
gzn
=
(-1)“(2n)!C,.1,
for all
n>O.
7.15 There are
(L)b+k
partitions with k other objects in the subset con-
taining n + 1. Hence
B’
(z) =
eZB
(z). The solution to this differential equation
is
e(z)
=
eez+c,
and c = -1 since B(0) = 1. (We can also get this result by
summing (7.49) on m, since b, =
1,
{t}.)
7.16 One way is to take the logarithm of
B(z) =
l/((l
-z)"'(l
-z2)"'(1
-~~)“~(l
-z4)aa
.
..).
then use the formula for In
&
and interchange the order of summation.
7.17 This follows since s,” tnePt dt = n
!.
There’s also a formula that goes
in the other direction:
G(z) =
&
s
+X
G ( zem~ie ) ee” d6 .
-x
7.18 (a)
<(z-
i);
(b) -L’(z); (c) L(z)/L(22). Every positive integer is
uniquely representable as m’q, where q is squarefree.
7.19
If n > 0, the coefficient
[zn]
exp(xln F(z)) is a polynomial of degree n
in x that’s a multiple of x. The first convolution formula comes from equating
coefficients of
Z”
in
F(z)"F(z)Y
=
F(z)~+Y.
The second comes from equating
coefficients of znP’ in
F'(z)F(z)"~'F(z)Y
=
F'(z)F(z)~+Y~',
because we have
F'(z)F(z)'-'
=
xm
'i(F(z)")
=
x-'
z
nf,(x)znP’
.
ll>C
(Further convolutions follow by taking
a/ax,
as in (7.43).)
7.20 Let G(z) =
Ena
gnzn. Then
.zlGik)(z) =
t
nkg,,znPk+’ =
x(n
+ k
-
l)%Jn+kPlzn
lL)O lL>O
for all k,
1
3 0, if we regard
g,,
= 0 for n < 0. Hence if
PO(Z),
. . , P,(z) are
polynomials, not all zero, having maximum degree d, then there are polyno-
mials PO(n), . . . , p,,,+d(n) such that
mfd
Po(Z)G(Z)
+-+
P,(z)Giml(z) =
7
F
Pibhn+i-dZn.
n>o
j:=o
Therefore a differentiably finite G(z) implies that
m+d
t
pj(n+d)gn+j
=
0,
for all n > 0.
j=O
This slow method of
finding the answer
is just the cashier’s
way of stalling until
the police come.
The USA has
two-cent pieces, but
they haven’t been
minted since 1873.
A ANSWERS TO EXERCISES 543
The converse is similar. (One consequence is that G(z) is differentiably finite
if and only if the corresponding egf,
e(z),
is differentiably finite.)
7.21 This is the problem of giving change with denominations 10 and 20, so
G(z)
=
l/(1
-z'")(l
-2") =
G(z"),
where
c(z)
=
l/(1
-z)(l
-2').
(a) The
partial fraction decomposition of
i;(z)
is
$
(1
-
z)-~
+
i
(1
-z)-'
+
f
(1 +
z)-'
,
so
[z”]
c(z)
=
i(2n
+ 3 + (-1)"). Setting n = 50 yields 26 ways to make
thepayment. (b) ~(z)=(1+z)/(l-z')2=(1+z)(l+2z2+3z4+~~~),~~
[z”]
c(z)
= Ln/2] + 1. (Compare this with the value
N,
= Ln/5] + 1 in the
text’s coin-changing problem. The bank robber’s problem is equivalent to the
problem of making change with pennies and tuppences.)
7.22 Each polygon has a “base” (the line segment at the bottom). If A
and B are triangulated polygons, let AAB be the result of pasting the base
of A to the upper left diagonal of
A,
and pasting the base of B to the upper
right diagonal. Thus, for example,
q LL=Q$
(The polygons might need to be warped a bit and/or banged into shape.)
Every triangulation arises in this way, because the base line is part of a unique
triangle and there are triangulated polygons A and B at its left and right.
Replacing each triangle by
z
gives a power series in which the coefficient
of
Z”
is the number of triangulations with n triangles, namely the number of
ways to decompose an (n+2)-gon into triangles. Since P = 1
+zP',
this is the
generating function for Catalan numbers
CO
+ Cl z +
CZZ’
+ . . .
;
the number
of ways to triangulate an n-gon is
C,-2
= (2,1-,4)/(n
-
1).
7.23 Let a,, be the stated number, and b, the number of ways with a 2x 1 x 1
notch missing at the top. By considering the possible patterns visible on the
top surface, we have
a
,,
=
2a,-l
+ 4b,_l + anPI + In = 01;
b, =
a,-1
+
b,-l.
Hence the generating functions satisfy A = 2zA + 4zB +
z2
A +
1,
B =
zA
+
zB,
and we have
l-z
A(z)
=
(l+z)(l-4z+z2)'
This formula relates to the problem of 3 x n domino tilings; we have a,, =
f(
U2,,
+Vzn+l +
(-l)n)
=
;(2+
fi)n+’
+
;(2-
fi)“+’
+
3(-l)“,
which is
(2 +
&)“+‘/6
rounded to the nearest integer.
544 ANSWERS TO EXERCISES
7.24
ntk,+...+k,rn
kl . . .
k,/m
= Fz,+l +
FznPl
-
2. (Consider the
coefficient
[znmll
&ln(l/(l
-G(z))), where G(z)
=z/(l
-z)~.)
7.25 The generating function is
P(z)/(l
-
z'~),
where
P(z)
=
z
+
2z2
+
. . + (m
-
1
)z+’
= ((m
-
1
)z”‘+’
-
mz”’ + z)/( 1
-
2)‘. The denominator
is Q(z) = 1
-
zm = (1
-
cu’z)(l
-
w'z)...(l
-
cumm’z). By the rational
expansion theorem for distinct roots, we obtain
nmodm =
zp
+
mg
s
.
k=l
7.26 (1
-
z
-
z2)5(z) =
F(z)
leads to
5,
=
(2(n
+
l)F,
+ nF,+1)/5
as in
equation (7.60).
7.27 Each oriented cycle pattern begins with
8
or
z
or a 2 x k cycle (for
some k 3 2) oriented in one of two ways. Hence
Qn = Qn-I + Qn-2
+2Qnm
2
+2Qnp3
+...+ZQo
for n 3 2; Qo = QI = 1. The generating function is therefore
Q(z) =
zQ(z)
+z'Q(z)
+2z'Q(z)/(l
-z)+l
= l/(1-z-z2-222/(1-z))
(1
-z)
=
(l-2z-2z2+23)
a2/5
=jqq+
c2/5
l-+2z
+
2/5
l+z'
and Q,, = (+2n+2 +
+~~2nP2
+
2(-1)“)/5
= (($‘l+l
-
$n+1)/&)2
=
Fi,,.
7.28 In general if A(z) = (1 +
z
+
...
+
zmp'
)B(z),
we have A, + A,,, +
A
rf2m + “’ = B(1) for 0 <
r
< m. In this case m = 10 and B(z) =
(1+z+~~~+z9)(1+z2+z4+z6+z8)(1+z5).
7.29 F(z)+ F(z)'+
F(z)~
+.
. .
(l/(1
-
(1
-
vmlJ8,
=z/(l-z-z2-z)=(l/(l-(l+&)z)-
so the answer is
((1 +
fi)"
-
(1
-
fi)n)/J8.
7.30 XL=,
(2”nm’,Pk)
(anbnPk/(l
-olz)k+anPkbn/(l
-@z)~), by exercise 5.39.
7.31 The dgf is <(~)~/<(z-l); hence we find g(n) is the product of (k+l-kp)
over all prime powers
pk
that exactly divide n.
7.32 We may assume that each
bk
3 0. A set of arithmetic progressions
forms an exact cover if and only if
1
Zbl
Zb,
-
=I
~
1-z
l-zo-1
+ . +
1
_
Za, .
A ANSWERS TO EXERCISES 545
Subtract zbm/( 1
-
z”m
) from both sides and set
z
= e2xi/am. The left side is
infinite, and the right side will be finite unless a,,-1 = a,.
7.33
(-l)n--m+‘[n>m]/(n
-
m).
7.34 We can also write G,(z) =
tk,+,m+,,k,+,=n
(kl~~k+~+l)(zm)km+l.
In
general, if
we have
G,
=
z1
G,_., + z2GnP2
+.
. . +
zrGnmmr
+ [n=O], and the generating
function is
l/(
1
-
z1
w
-
zzw2
-
. . .
-
z,.w’). In the stated special case the
answer is
l/(1
-w
-
zmwm+‘). (See (5.74) for the case m = 1.)
7’35
Cal
t
t0<k<n
(l/k+l/(n-k)) =
iH,-l.
(b)
[z”]
(ln&)‘=
$[i]
=
$H,-l
by (7.50) and (6.58). Another way to do part (b) is to use the rule
[zn]
F(z) = i[znP’] F’(z) with F(z) = (In
&)2.
7.36
+$A(P).
7.37 (a) The amazing identity azn =
az,+l
= b, holds in the table
nlO1234
5 6 7 8 910
an
1 1
2
2
4 4
6 6 10 10 14
bn
1
2
4 6 10 14
20 26 36 46 60 --?J
(b) A(z) =
l/((l
-
z)(l -z')(l
-
z4)(1
-z8)...). (c) B(z) = A(z)/(l -z),
and we want to show that A(z) = (1 + z)B(z’). This follows from A(z) =
A(z’),‘(
1
-
z).
7.38 (1
-
wz)M(w,z)
=
tm,n3,(min(m,n)
~
min(m-l,n-l))wmzn
=
t
m,n
al
wmzn = wz/( 1
-
w) (1 ~ z). In general,
M(z,,...,z,)
=
Z]
.
..zm
(1 -z,)...(l
-z,)(l
-z1
. ..z.) .
7.39 The answers to the hint are
t
ak,
ak2
.
ak,,
and
t
ok,
ak2
.
ak,,,
,
l<k,<kl<...<k,,,$n
l<k,$k>s...$k,,,$n
respectively. Therefore: (a) We want the coefficient of zm in the product
(l+z)(l+2z)...(l+nz).
Thisisthereflectionof(z+l)“,soitis[~~~]+
[“,“]Z+
.‘.
+’
[“:‘]z”
and the answer is
[,T:
lm]. (b) The coefficient of
z”’
in
l/((l
-z)(l
-2z)...(l
-nz))
is {“‘,‘“} by(7.47).
546
ANSWERS TO EXERCISES
7.40 The egf for (nF, 1
-
F,) is (z
-
lip(z)
where
i(z)
=
J&0
F,z”/n! =
(e@’
-
e&‘)/fi.
The egf for (ni) is e
-‘/(l
-
z). The product is
5
l/2
(e
Ii
112
-el@
‘12)
=
5
liqe
-42
_~
e
6’).
We have
i(z
=
-?(-z).
So the answer is
(--l)"F,.
7.41 The number of up-down permutations with the largest element n in
position 2k is
(z”k
‘,)A
ok
,A,,
-2k.
Similarly, the number of up-down permu-
tations with the smallest element 1 in position 2k + 1 is (‘&‘)A2kAn
ok
1,
because down-up permutations and up-down permutations are equally nu-
merous. Summing over all possibilities gives
AkA,~r~~k+2[n=O] +
[n=l]
The egf
A
therefore satisfies 2A’(2) = A(z)’ + 1 and A(0) = 1; the given
function solves this differential equation.
7.42 Let a, be the number of Martian DNA strings that don’t end with
c
or e; let b, be the number that do. Then
a
n
=
3~~1
+
2b,-
1 + [n = 01, b,
-=
2a
n
I
+
h-1
;
A(z) = 3zA(z)
+2zB(z)
+ 1 ,
B(z)
=
ZzA(z)+zB(z);
and the total number is [z”]
(
1 +
z)/(
1
-
42
~
z2)
=
F3,,
+2.
7.43 By (5.45),
g,,
= An6(0). The nth difference of a product can be
written
A”A(z)B(z) =
t
‘k”
(ACED
kA(Z))(AR
k~(~))
)
k
0
and
En
k
= (1 +
A)nPk
=
xi
(“j “)Ai. Therefore we find
hn
=
G
(E)
(“yk)
fjfkgn-k.
3
This is a sum over all trinomial coefficients; it can be put into the more
symmetric form
fj+kgk+L.
A ANSWERS TO EXERCISES 547
The empty set
is pointless.
7.44 Each partition into k nonempty subsets can be ordered in k! ways, so
bk
= k!. Thus
Q(Z)
= ,&k>O {E}k!zn/n! = tkao(eZ
-
l)k =
l/(2
-
e’).
And this is the geometric series
tkao
ekZ/2k+‘,
hence
ok
= 1
/2kf’.
Finally,
ck
= 2k; consider all permutations when the x’s are distinct, change each ‘>’
between subscripts to ‘<’ and allow each ‘<’ between subscripts to become
either ‘<’ or ‘=‘. (For example, the permutation
x1
~3x2 produces
x1
< x3 < x2
and
x1
= x3 < x2, because 1 < 3 > 2.)
7.45 This sum is
,&,
r(n)/n2,
where r(n) is the number of ways to write
n as a product of two relatively prime factors. If n is divisible by t distinct
primes, r(n) = 2t. Hence r(n)/n2 is multiplicative and the sum is
n(
l+l+F
P
$
p4-)
=
I$+&)
=
n(g)
=
L(2)2/L(4)
=
;.
P
7.46 Let
S,
= ,Y0skSn,2 (ni2k)
CX~.
Then
S,
=
S,
1 + as,,
3
+ [n =O], and
the generating function is
l/(1
-
z
-
az3). When a =
-A,
the hint tells us
that this has a nice factorization
l/(1
+
iz)(
1
-
52)‘. The general expansion
theorem now yields
S,
=
($n+c)($)n+$(-i)n,
and the remaining constant c
turns out to be $.
7.47 The Stern-Brocot representation of
&
is
R(LR2)D0,
because
d3+1=2+
'
1'
'+x&+1
The fractions are
f,
f,
3, 5,
s,
y,
#,
E,
. .
.
;
they eventually have the cyclic
pattern
Vzn-1
+v2,+
I
U2n+VZn+l u2n+2+v2n
1
VZn+l
+VZn+3
U2n
,
VZn+l
,
UZn+VZn+l
9
U2n+2
. . . .
7.48 We have
go
= 0, and if
g1
= m the generating function satisfies
aG(z)+bzP’G(z)+czP2(G(z)-mz)+&
= 0.
Hence G(z) =
P(z)/(az'
+ bz + c)(l
-z)
for some polynomial P(z). Let
p1
and
p2
be the roots of
cz2
+ bz + a, with
lp,
( >
/pal.
If
b2
-
4ac 6 0 then
lp,
1’
=
p1
pz
= a/c is rational, contradicting the fact that
6
approaches
548 ANSWERS TO EXERCISES
1
+
a.
Hence
p1
= (-b +
dm)/Zc
= 1 +
4;
and this implies that
o
=
-c,
b =
-2c,
p2
= 1
-
fi.
The generating function now takes the form
z(m
-
(r + m)z)
G(z)
= (1 -22-z2)(1 -2)
-r
+
(m +
2r)z
r
= 2(1-22-z')
+2(1-z!
= mz+
(2m-r)z2
+...
,
where
r
= d/c. Since
g2
is an integer,
r
is an integer. We also have
9
n=
a(1
+Jz)n
+a(1
-Jz)n+
tr =
[cx(l+Jz)"],
and this can hold only if
r
= -1, because (1
-
a)”
alternates in sign as
it approaches zero. Hence (a, b, c, d) = *(
1,2,
-1,l).
Now we find
o(
=
i
(1 +
fi
m), which is between 0 and 1 only if 0 < m 6 2. Each of
these values actually gives a solution; the sequences (g,,) are
(O,O,
1,3,8,.
.
.),
(0,1,3,8,20
,...
), and
(0,2,5,13,32
,...
).
7.49 (a) The denominator of
(l/(1
-
(1
+
fl)z)
+
l/(1
-
(1
-
&!)z))
is
1
-
22
-
z2;
hence a,, =
2a,-l
+
a,-2
for n 3 2. (b) True because a,, is even
and -1
<
1
-
d
<
0. (c)
Let
b,
=
(!?+!t?)n+(v)n
We would like b, to be odd for all n > 0, and -1 < (p
-
Jsi)/Z
< 0. Working
as in part (a), we find
bo
= 2,
bl
=
p,
and b, = pb,~-l +
i(q
-
p2)bn.
2
for
n 3 2. One satisfactory solution has p = 3 and q = 17.
7.50 Extending the multiplication idea of exercise 22, we have
Q =
-+
QAQ
+
QfjQ
+
QQdQ
+..
Replace each n-gon by
znm
‘.
This substitution behaves properly under mul-
tiplication, because the pasting operation takes an m-gon and an n-gon into
an (m + n
-
2)-gon. Thus the generating function is
Q =
1+zQ2+z2Q3+z3Q4+...
=
I.+*
1
-zQ
and the quadratic formula gives Q = (1
+z-dl
-
6z +
z2
)
/2z.
The coefficient
of z” -’ in this power series is the number of ways to put nonoverlapping
diagonals into a convex n-gon. These coefficients apparently have no closed
Give
me
Legen-
form in terms of other quantities that we have discussed in this book, but
their asymptotic behavior is known
[173,
exercise 2.2.1-121.
~~~~$~j~~~~
a
closed form.
A ANSWERS TO EXERCISES
549
Incidentally, if each n-gon in Q is replaced by wzne2 we get
Q=
l+z-dl-(4w+2)z+z2
211
+w)z
,
a formula in which the coefficient of wmzn
is the number of ways to divide
an n-gon into m polygons by nonintersecting diagonals.
7.51 The key first step is to observe that the square of the number of ways
is the number of cycle patterns of a certain kind, generalizing exercise 27.
These can be enumerated by evaluating the determinant of a matrix whose
eigenvalues are not difficult to determine. When m = 3 and n = 4, the fact
that cos36’ =
$/2
is helpful (exercise 6.46).
7.52 The first few cases are
PO(Y)
= 1,
pi(y)
= y,
pi
= y2 + y,
P3(Y)
= Y3 +
3Y2
+ 3Y.
Let p,(y) = qrn(x) where y = x(1 ~ x); we
seek a generating function that defines qln+l (x) in a convenient way. One
such function is
x:,
q,(x)z”/n!
= 2eixZ/(eiZ +
l),
from which it follows that
q,,(x) =
i”E,
(x), where E,(x) is called an Euler polynomial. We have
t(-l)“x”bx
=
$(-1)
‘+’ E,(x), so Euler polynomials are analogous to Ber-
noulli polynomials, and they have factors analogous to those in (6.98). By
exercise 6.23 we have nEnpl (x) =
EL,
(z)Bkx”
k(2-2k+‘);
this polynomial
has integer coefficients by exercise 6.54. Hence
q2,,
(x), whose coefficients
have denominators that are powers of 2, must have integer coefficients. Hence
p,(y) has integer coefficients. Finally, the relation (4y
-
l)pl(y)
+ 2pk(y) =
Ln(2n
-
1
)p,-l
(y) shows that
2m(2m-1)
n =
m(m+l)
I I
m
I I
m:
, +
2n(2n
-
1)
n-l
I I
m-l
and it follows that the
I:[’
s are positive. (A similar proof shows that the
related quantity
(-l)n(2n
+ 2)Eln+l (x)/(2x
-
1) has positive integer coeffi-
cients, when expressed as an nth degree polynomial in y.) It can be shown
that
I;‘1
is the Genocchi number
(-l)np’
(22n+’
~ 2)B2, (see exercise 6.24),
and that
Inn,1
=
(;),
Inn21
= 2(“qf’)
+3(T),
etc.
7s53
It
is
P(l+V~~,-,+V4,,r~1/6.
Thus, for example,
TJO
=
PJL
= 210;
T2s5
=
PI65
= 40755.
7.54 Let
Ek
be the operation on power series that sets all coefficients to zero
except those of zn where n mod m = k. The stated construction is equivalent
to the operation
EoSEoS(Eo+El)S
. . .
S(Eo+E,
+...+E,,m,)
/
550 ANSWERS TO EXERCISES
applied to
l/(
1
-
z), where S means “multiply by
l/(
1
-
z).” There are m!
terms
EosEk,
s&s
. . .
SEk,,,
where 0 6 ki < j, and every such term evaluates to zrm/( 1
-
zm)
m+’
if
r
is the
number of places where ki < ki+r .
Exactly (y) terms have a given value of r,
so the coefficient of zmn is
x2;’
(~)(“‘,‘~‘)
=
(n+l)“’
by (6.37). (The fact
that operation
Ek
can be expressed with complex roots of unity seems to be
of no help in this problem.)
7.55 Suppose that Po(z)F(z) +
...
+
P,(z)Fiml(z)
= Qo(z)G(z)
+
...
+
Qn(z)Gin)(z) = 0, where P,(z) and Qn(z) are nonzero. (a) Let H(z) =
F(z) + G (2). Then there are rational functions
Rk,l
(z) for 0 <
1
< m + n such
that Hck)(z) =
Rk,o(z)FCo)(z)
+
...
+ Rk,mpl(~)F’mp’)(~) + Rk,,,(z)Gcol(z) +
. .
+
Rk,m+n-1 (z) G
(n-‘i(z). The
m+n+l
vectors
(Rk,O(z),...,Rk,m+n-l(~))
are linearly dependent in the (m + n)-dimensional vector space whose com-
ponents are rational functions; hence there are rational functions
S(z),
not
all zero, such that
SO(Z)H~~~(Z)
+
...
+ S,+,(Z)H~~+~~(Z) = 0. (b) Sim-
ilarly, let H(z) = F(z) G (2). There are rational Rk,l(z) for 0 $
1
< mn
with H’k’(~) =
XL;’
I;<:
Rk,ni+j
(z)Fiii(z)Gii)(z),
henceSo(z)H”‘(z)+...+
Smn(z)Himnl (z) = 0 for some rational St(z), not all zero. (A similar proof
shows that if (fn) and
(gn)
are polynomially recursive, so are (f, +
gn)
and
(fngn). Incidentally, there is no similar result for quotients; for example, cos z
is differentiably finite, but 1 /cos z is not.)
7.56 Euler showed, incidentally, that this number is also
[zn]
l/d-,
and he gave the formula a, =
tk,O
n&/k.
.
1’
He also discovered a “memorable
failure of induction” while examining these numbers: Although
3a,
-
an+1
is
equal to
F,-
1
(F,-,
+ 1) for 0 $ n < 9, this empirical law mysteriously breaks
down when n is 9 or more!
7.57 (Paul ErdGs currently offers $500 for a solution.)
8.1
& + & +
&J
+ & + & + & =
i.
(In fact, we always get doubles with
probability
i
when at least one of the dice is fair.) Any two faces whose sum
is 7 have the same probability in distribution
Prl,
so S = 7 has the same
probability as doubles.
8.2
There are 12 ways to specify the top and bottom cards and
50!
ways to
arrange the others; so the probability is 12.50!/52! = 12/(51.52)
=
&
=
A.
8.3
&(3+2-t..
.+9+2)
= 4.8;
$(32+22+.
.
.+92+22-10(4.8)2)
=
z
z
8.6.
The true mean and variance with a fair coin are 6 and 22, so Stanford had
an unusually heads-up class. The corresponding Princeton figures are 6.4 and
A ANSWERS TO EXERCISES 551
562
z
12.5. (This distribution has ~4 = 2974, which is rather large. Hence the
45
standard deviation of this variance estimate when n = 10 is also rather large,
J2974/10 +
2(22)2/9
M
20.1 according to exercise 54. One cannot complain
that the students cheated.)
a.4
This follows from (8.38) and (88g), because
F(z)
= G(z)H(z). (A
similar formula holds for all the cumulants, even though
F(z)
and
G(z)
may
have negative coefficients.)
8.5
Replace H by p and T by q = 1
-p.
If
SA
=
Ss
=
i
we have
p2qN
=
i
and
pq2N
= iq +
i;
the solution is p =
l/~$~,
q = l/G.
8.6
In this case
Xly
has the same distribution as X, for all y, hence
E(XIY) = EX is constant and
V(E(XlY))
= 0. Also
V(XlY)
is constant and
equal to its expected value.
8.7
We have 1 =
(PI
+pz+.+.+ps)’ 6 6(p~+p~+.~.+p~) by Chebyshev’s
summation inequality of Chapter 2.
8.8
Let p =
Pr(wEAflB),
q = Pr(~ueA), and r = Pr(w$B). Then
p+q+r=l,andtheidentitytobeprovedisp=(p+r)(p+q)-qr.
8.9 This is true (subject to the obvious proviso that
F
and G are defined
on the respective ranges of X and Y), because
Pr(F(X)=f and G(Y)=g) =
x
Pr(X=xandY=y)
Y&F-‘(~)
YEG-‘(9)
=
x
Pr(X=x) .Pr(Y=y)
c&F-‘(f)
YEG-‘(9)
=
Pr(F(X)
=f) . Pr(G(y) = g) .
8.10 Two. Let
x1
<
x2
be medians; then 1 < Pr(X<xl) + Pr(X>xz) <
1,
hence equality holds. (Some discrete distributions have no median ele-
ments. For example, let
n
be the set of all fractions of the form
&l/n,
with
Pr(+l/n) = Pr(-l/n) =
$n2.)
8.11 For example, let K = k with probability 4/( k + 1) (k + 2) (k +
3))
for all
integers k 3 0. Then EK = 1, but
E(K’)
=
00.
(Similarly we can construct
random variables with finite cumulants through
K,
but with
K,+I
= co.)
8.12
(a) Let
pk
= Pr(X = k). If 0 < x < 1, we have Pr(X 6
r)
= tk<,.pk <
t
k<r
xk-r,,k
6
xkxk-’
pk
= x-‘P(x). The other inequality has a similar
proof. (b) Let x = a/(1
-
a) to minimize the right-hand side. (A more
precise estimate for the given sum is obtained in exercise 9.42.)
552 ANSWERS TO EXERCISES
8.13 (Solution by Boris Pittel.) Let us set Y =
(XI
+ . . . + X,)/n and
Z = (X,+1 + . + X2,)/n. Then
=
Pr(JZ-cxJ
6
IY-al)
3
t.
The last inequality is, in fact, ‘>’ in any discrete probability distribution,
because Pr(Y = Z) > 0.
8.14 Mean(H) = pMean(F) + qMean(G); Var(H) = pVar(F) + qVar(G) +
pq(Mean(F)-Mean(G))‘. (A mixture is actually a special case of conditional
probabilities: Let Y be the coin, let
XIH
be generated by F(z), and let
XIT
be generated by G(z). Then VX = EV(XIY) + VE(XlY), where EV(XlY) =
pV(XIH) + qV(XIT) and VE(XlY) is the variance of
pzMeanCF)
+
qzMeanfG).)
8.15 By the chain rule, H’(z) =
G’(z)F’(G(z));
H”(z) =
G”(z)F’(G(z))
+
G’(z)‘F”(G(z)). Hence
Mean(H) = Mean(F) Mean(G) ;
Var(H) = Var(F) Mean(G)’ + Mean(F) Var(G)
(The random variable corresponding to probability distribution H can be un-
derstood as follows: Determine a nonnegative integer n by distribution F;
then add the values of n independent random variables that have distribu-
tion G. The identity for variance in this exercise is a special case of (8.105),
when X has distribution H and Y has distribution F.)
8.16
ewizmm’)/(l
-w).
8.17
Pr(Y,,,
6 m) =
Pr(Y,,,
+ n 6 m + n) = probability that we need <
n
+ n tosses to obtain n heads = probability that m + n tosses yield 3 n
heads = Pr(X
m+n,p
3 n). Thus
n+k-1
k
>
pnqk =
t
(m;n)p*qm+n~k
k>n
=
x
(m;n)pm+n-kqk;
k<m
and this is (5.19) with n =
I-,
x
= q, y = p.
8.18 (a) Gx(z) =
epfz
‘1.
(b) The mth cumulant is
p,
for all m 3 1. (The
case p = 1 is called
F,
in (8.55).)
A ANSWERS TO EXERCISES 553
8.19 (a)
Gx,+x,(z)
=
Gx,
(z)Gx>(z) = ervl+plrir ‘I. Hence the probability
is
eW1+p2
(~1
+ pz)“/n!; the sum of independent Poisson variables is Poisson.
(b) In general, if K,X denotes the mth cumulant of a random variable X, we
have
K,(aX,
+
bX2‘t
= am(K,X1) + bm(K,Xz), when a, b 3 0. Hence the
answer is 2mpl + Smu2.
8.20 The general pgf will be G(z) = zm/F(z), where
F(z) = zm
-
(1
-r)~aikl;A:kl=A,kl]Zm~~k,
k=l
F’(1)
=
m-f&,~[A(k’=A~k,],
k=l
F”(1) =
m(m-
1)
-2~(m-k)A:kI[Aik)=Alkl].
k=l
8.21 This is Ena0
qn,
where
q,,
is the probability that the game between
Alice and Bill is still incomplete after n flips. Let
pn
be the probability that
the game ends at the nth flip; then p,, + q,, =
q,-l.
Hence the average time
to play the game is
Ena,
wn=(q0-q41)+4ql
-q2)+3(42-43)+...=
qo
+
q1
+
q2
+.
. = N, since
lim,,,
nq, = 0.
Another way to establish this answer is to replace H and T by
:z.
Then the derivative of the first equation in (8.78) tells us that N (1) + N’( 1) =
N’(l)+S;(l)+Sf,(l).
By the way, N =
y.
8.22 By definition we have V(XjY) = E(X’lY)
-
(E(XlY))’ and V(E(xlY)) =
E(KW’l12)
-
(E(EIW’)))2;
h
ence
E(V(XlY))
+V(E(XlY))
= E(E(X’lY))
-
(E(E(X/Y)))‘. But E(E(XlY)) = EX and E(E(X’IY)) = E(X2), so the result is
just VX.
8.23
Let C& =
{
q
,
q
}’
and
fll
=(m,
q
,
q
,
H}‘;
and let
02
be the
other 16 elements of n. Then Prl1 (w)
-
Pro0
(cu)
=
$&
&,
&
according
as w
E
no,
RI,
02.
The events A must therefore be chosen with
kj
elements
fromnj,
where
(ko,kl,kl)
is one ofthe following:
(O,O,O),
(0,2,7),
(0,4,14),
(1,4,4),
(1,6,11),
(2,6,1),
(2,&g),
(3,&15),
(3,10,5),
(3,12,12),
(4,12,2),
(4,14,9),
(4,16,16).
Forexample, thereare
($(‘,“)(‘f)
eventsoftype
(2,6,1).
The total number of such events is
[z’]
(1 + z”)~ (1 +
z
‘)16( 1 + z2)16, which
turns out to be 1304927002. If we restrict ourselves to events that depend
on S only, we get 40 solutions S
E’
A, where A =
0,
{
f2,
;b
, t},
{
f2, 5,9},
(2,12
,
PO,
z, 5,9}, {2,4,6,8,10,12},
{
f, ,7, z, 4,
lo},
and the complements of
these sets. (Here the notation
‘,:’
means either 2 or 12 but not both.)
554
ANSWERS TO EXERCISES
8.24 (a) Any one of the dice ends up in J’s possession with probability
;
I,“+‘;$+;
p, hence p = A. Let q = A. Then the pgf for J’s total holdings
with mean (2n +
l)p
and
(b)
(;)p3q2
+
(;)p;'4
+
(;)p'
=
w
=
variance (2n +
l)pq,
by (8.61).
.585.
8.25 The pgf for the current stake after n rolls is G,(z), where
Go(z)
=
z*;
G,(z)
=
I;=, G,
,(~"~~')9/6,
for n > 0.
This problem can
(The noninteger exponents cause no trouble.) It follows that Mean(G,,) =
perhaps
be
so’ved
Mean(G,
I
),
and
Var(G,)
-t
Mean(
= f$(Var(G,,
1)
+ Mean(G,
I
1’).
more easily without
So the mean is always A, but the variance grows to
((g)”
~ l)A2.
geprat;ngfunct;ons
than with them.
8.26 The pgf
FL,,(Z)
satisfies F,',,,(z) =
FL,,
L(z)/L; hence Mean(Ft,,) =
FL,,
(1) = [n 3 L]/l and
F,lln
(1) = [n 3
21]/L2;
the variance is easily computed.
(In fact, we have
which approaches a Poisson distribution with mean
l/1
as n
+
co.)
8.27 (n2L3
-
3nZ21-1
+
Zt:)/n(n
-
l)(n
-
2) has the desired mean, where
xk
=
XF
+ +
Xk.
This follows from the identities
E(x211)
= np3
+n(n-
1)p2p1
;
E(x:)
=
w3+3n(n-1)p2pl
+n(n-l)(n-2)p:.
Incidentally, the third cumulant is
~3
= E( (X-EX)3), but the fourth cumulant
does not have such a simple expression; we have
~~
= E (( X
-
EX)4)
-
3(
VX)'.
8.28 (The exercise implicitly calls for p = q
=:
i,
but the general answer is
given here for completeness.) Replace H by pz and T by qz, getting S*(z) =
p2qz3/(1
-pz)(l
-
qz)(l
-pqz’)
and
SB(Z)
=
pq2z3/(1
~
qz)(l
-pqz2).
The
pgf for the conditional probability that Alice wins at the nth flip, given that
she wins the game, is
S*(Z)
_
3
4
P
1-Pq
SAtI)
l-pz
l-qz
l-pqz2
This is a product of pseudo-pgf’s, whose mean is
3+p/q+q/p+2pq/(
1 -pq).
The formulas for Bill are the same but without the factor
q/(
1
-pz),
so Bill’s
mean is 3 + q/p +
2pq/(l
-pq). When p = q =
i,
the answer in case (a) is
A ANSWERS TO EXERCISES 555
7;
in case (b) it is
y.
Bill wins only half as often, but when he does win he
tends to win sooner. The overall average number of flips is
5
y
+ $. $! = l$,
agreeing with exercise 21. The solitaire game for each pattern has a waiting
time of 8.
8 29
.
Set H = T =
1
in
2
l+N(H+T)
=
N+SA+SB+S~
N HHTH = SA(~+HTH)+S~(HTH+TH+~)+S~(HTH+TH)
N HTHH =
SA(THH
+ H) +
S~(THH
+ 1) +
S~(THH)
N THHH =
SA(~H)
+
So
+
SC
to get the winning probabilities. In general we will have
SA
+ Ss +
SC
= 1
and
SA(A:A)
+ Ss(B:A) + Sc(C:A) = SA(A:B) +Ss(B:B) + Sc(C:B)
= SA(A:B) + Ss(B:C) + Sc(C:C).
In particular, the equations 9SA+SSs+3Sc =5SA+9Ss+Sc = 2SA+4Ss+9Sc
imply that
SA
=
g,
Sa
=
g,
Sc =
g.
8.30 The variance of P(hl , . . .
,
h,;
k) I k is the variance of the shifted bino-
mial distribution ((m ~ 1 +
z)/m)
k
z, which is
(k-1)($)(1
-
$)
by (8.61).
Hence the average of the variance is Mean(S)(m
-
1)/m’.
The variance of
the average is the variance of (k
-
1
)/m,
namely Var(
S)/m’.
According to
(8.105), the sum of these two quantities should be VP, and it is. Indeed, we
have just replayed the derivation of
(8.~5)
in slight disguise. (See exercise 15.)
8.31 (a) A brute force solution would set up five equations in five unknowns:
A = ;zB +
;zE;
B =
;zC;
C = 1 + +zB +
;zD;
D =
;zC
+
;zE;
E =
;zD.
But positions C and D are equidistant from the goal, as are B and E, so we
can lump them together. If X = B + E and Y = C + D, there are now three
equations:
A =
;zX;
X =
;zY;
Y = 1
+;zX+;zY.
Hence A = z2/(4 ~ 22
-
z’); we have Mean(A) = 6 and Var(A) = 22. (Rings
a bell? In fact, this problem is equivalent to flipping a fair coin until get-
ting heads twice in a row: Heads means “advance toward the apple” and
tails means “go back.“) (b) Chebyshev’s inequality says that Pr(S 3 100) =
Pr( (S
-
6)’ 3 942) 6 22/942
z
.0025. (c) The second tail equality says that
Pr( S > 100) < 1 /x98 (4
-
2x
-
x2) for all x 3
1,
and we get the upper bound
0.00000005 when x =
(v’m
-
99)/l 00. (The actual probability is approx-
imately 0.0000000009, according to exercise 37.)
556 ANSWERS TO EXERCISES
8.32 By symmetry, we can reduce each month’s situation to one of four
possibilities:
“Toto, I have a
feeling we’re not in
D, the states are diagonally opposite;
A, the states are adjacent and not Kansas;
K, the states are Kansas and one other;
S,
the states are the same.
Kansas anymore.
-Dorothy
Considering the Markovian transitions, we get four equations
D = 1
+z($D++)
A =
z(;A+
AK)
4 4 4
K =
z(~D+~A+~~K)
S =
z(fD
+ ;A +
$K)
whosesumisD+K+A+S=l+z(D+A+K).
Thesolutionis
s=
812-452=
--
423
243-243z+24z2 +8z3
'
but the simplest way to find the mean and variance may be to write z = 1 + w
and expand in powers of w, ignoring multiples of w2:
D
=
g+‘593w+.,..
16 512
A
=
?+?115w+..,.’
8 256
K
=
fi+&i!iiw+.,..
8 256
Now
S’(1)
=
g
+
g
+
$
=
$,
and
is”(l)
=
s
+
z
+
$$!
=
w.
The
mean is
$
and the variance is
y.
(Is there a simpler way?)
8.33 First answer: Clearly yes, because the hash values
hl,
. . . , h, are
independent. Second answer: Certainly no, even though the hash values
hi,
h, areindependent. WehavePr(Xj=O)
=xt=,
sk([j#k](m-1)/m)
=
(;‘lsj)(m-1)/m,
but
Pr(XI=Xz=O)
=xL=,
sk[k>2](m-l)2/m2
= (1
-
s1
-
sl)(m-
1)=/m= # Pr(XI =0) Pr(X2 =O).
8.34 Let [z”] S,(z) be the probability that Gina has advanced < m steps
after taking n turns. Then
S,(
1) is her average score on a par-m hole;
[z”‘]
S,(z) is the probability that she loses such a hole against a steady player;
and 1
-
[z’“~~‘]
S,(z) is the probability that she wins it. We have the recurrence
So(z) = 0;
S,(z)
=
(1
+ P&l~~Z(~) +
q&lpl
(z))/(l
-
=I,
for m > 0.
A ANSWERS TO EXERCISES 557
To solve part (a), it suffices to compute the coefficients for
m,n
< 4; it is
convenient to replace
z
by 100~ so that the computations involve nothing
but integers. We obtain the following tableau of coefficients:
so0
0 0 0 0
Sl
1
4
16
64 256
s2
1
95 744 4432
23552
S3
1
100
9065 104044 819808
.Sq
1
100 9975 868535 12964304
Therefore Gina wins with probability 1
-
.868535 =
.131465;
she loses with
probability .12964304. (b) To find the mean number of strokes, we compute
S,(l) =
g;
Sz(1)
= gj$; SJ(1) =
m;
S‘$(l)
=
J#gg.
(Incidentally,
Ss
(
1)
E 4.9995; she wins with respect to both holes and strokes
on a par-5 hole, but loses either way when par is 3.)
8.35 The condition will be true for all n if and only if it is true for n =
1,
by the Chinese remainder theorem. One necessary and sufficient condition is
the polynomial identity
but that just more-or-less restates the problem. A simpler characterization is
(P2+P4+P6)(P3+P6)
=
p6,
(PI
+P3+P5)(PZ+pS)
=
?-75,
which checks only two of the coefficients in the former product. The general
solution has three degrees of freedom: Let
a0
+
al
=
bo
+
b1
+
b2
=
1,
and
put
PI
=alh,
PZ
=aobz,
~3
= alhI
~4
= aoh,
PS
= alh,
p6
= aobo.
8.36 (a)
q q q q q q
. (b) If the kth die has faces with ~1,
. . . 9
s6
spots, let pk(Z) = 2”
+.
. .+I?“. We want to find such polynomials with
Pl(Z)...P,(Z)
=
(z+z2+z3+z4+z5+z
6
n.
The irreducible factors of this
)
polynomial with rational coefficients are zn(z + 1 )“(z’ + z + 1 )“(z’
-
z +
1)";
hencep~(z)mustbeoftheformzak(z+l)b~(z2+z+1)Ck(z2-z+l)d~.
We
must have
ok
3
1,
since pk(0) = 0; and in fact
ok
=
1,
since
al
+.
.+
a,, = n.
Furthermore the condition pk( 1) = 6 implies that
bk
=
ck
=
1.
It is now easy
to see that 0 6
dk
< 2, since
dk
> 2 gives negative coefficients. When d = 0
and d = 2, we get the two dice in part (a); therefore the only solutions have
k pairs of dice as in (a), plus n
-
2k ordinary dice, for some k 6
in.
558 ANSWERS TO EXERCISES
8.37 The number of coin-toss sequences of length n is F,-‘, for all n > 0,
because of the relation between domino tilings and coin flips. Therefore the
probability that exactly n tosses are needed is
F,-’
/2n, when the coin is fair.
Also
q,,
=
Fn+‘/2n-‘,
since xkbnFnzn = (Fnzn +
F,-~z~+‘)/(l
-
z
-
zz).
(A systematic solution via generating functions is, of course, also possible.)
8.38
When k faces have been seen, the task of rolling a new one is equivalent
to flipping coins with success probability
pk
= (m
-
k)/m. Hence the pgf is
11
nki(,Pkd(l
-
qkz)
=
nk:zO(
m
-
k)z/(m
-
kz). The mean is ~~~~pk’ =
M-b
-
H,-l);
the variance is m’(H!’
-
HE!,)
-
m(H,
-
H,-1); and
equation (7.47) provides a closed form for the requested probability, namely
mpnm!{;I,‘}/(
m-L)!. (The problem discussed in this exercise is traditionally
called “coupon collecting!‘)
8.39 E(X) = P(-1); V(X) = P(-2)
-
P(-l)2;
E(lnX) = -P’(O).
8.40 (a) We have
K,
=
n(O!{';}p-l!{T}p2
+2!{y}p3
-...),
by (7.49).
Incidentally, the third cumulant is npq(q-p) and the fourth is npq(l-6pq).
Theidentity q+pet = (p+qept)et shows that f,,,(p) =
(-l)“f,(q)+[m=l];
hence we can write
f,,,(p) = g,,,(pq)(q-p)'m
Odd], where
g,,,
is a polynomial
of degree [m/2], whenever m > 1. (b) Let p =
i
and F(t) = ln(
5
+
Set).
Then~,~,~,tm-‘/(m-1~t!=F’(t)=1-1/(et+1),andwecanuseexercise
6.23.
8.41
If
G(z)
is the pgf for a random variable X that assumes only positive
integer values, then s:
G(z)
dz/z =
tk>,
Pr(X=k)/k = E(X-‘). If X is the
distribution of the number of flips to obtain n + 1 heads, we have G(z) =
(PZ/(l
-
qz)y+’
by
(8.5g),
and the integral is
if we substitute w =
pz/(l
.-
qz). When p = q the integrand can be written
(-l)n((l+w)~1-l+w-w2+~~~+(-l)nwn~’),sotheintegralis
(-l)n(ln2-
l+~-~+...+(-l)n/n).
WehaveH~,-Hn=ln2-~n~‘+~n~Z+O(n~4)
by
(g.28),
and it follows
tha.t
E(X,:,) =
:n-’
-
$np2
+
O(np4).
8.42 Let F,(z) and G,(z) be pgf’s for the number of employed evenings, if
the man is initially unemployed or employed, respectively. Let
qh
= 1
-
ph
and
qf
:= 1
-pf.
Then
Fo(z)
=
Go(z)
= 1, and
F,,(z) =
PhZGv’
(Z)
+ qhFm
(Z)
;
G,,(Z)
=
PfFn-1 (z)
+
qfZGn-1
(z).
A ANSWERS TO EXERCISES 559
The solution is given by the super generating function
G(w,z)
=
2
G,(z)w”
=
A(w)/(l
-
zB(w)) ,
n30
where B(w) =
w(qf-(qf-p,,)w)/(l
-qhw)
and A(w) = (1
-B(w))/(l
-w).
Now tnaO GA(l)w” = cxw/(l
-w)~+
B/(1
-w)
-
B/(1
-
(qf
-ph)w)
where
OL=Ph
Ph+Pf
B
=
Pf(qf
-Ph)
.
(Ph+Pf)’
hence
G;(l)
=
cxnf
fi(l
-
(qf-ph)n).
(Similarly
G:(l)
=
K2n2
+ O(n), so
the variance is O(n).)
8.43
G,(z)
=
1
k20
[L].zk/n!
=
z?/n!,
by (6.11). This is a product of
binomialpgf’s,
nE=,
((k-l
+z)/k),
w
h
ere the kth has mean 1 /k and variance
(k-
1)/k’;
hence
Mean
=
H,
and
Var(G,)
=
H,
-
Ht).
8.44 (a) The champion must be undefeated in n rounds, so the answer is
pn.
(b,c) Players
xl,
. . . ,
X~L
must be “seeded” (by chance) in distinct subtour-
naments and they must win all 2k(n
-
k) of their matches. The 2” leaves of
the tournament tree can be filled in
2n!
ways; to seed it we have 2k!(2”-k)2k
ways to place the top
2k
players, and (2”
-
2k)! ways to place the others.
Hence the probability is (2p)2k’n-k)/(iE). When k = 1 this simplifies to
(2~~)“~l/(2”
-
1). (d) Each tournament outcome corresponds to a permuta-
tion of the players: Let
y1
be the champ; let
y2
be the other finalist; let
y3
and
y4
be the players who lost to
yr
and
y2
in the semifinals; let
(ys,
. . . ,ys) be
those who lost respectively to
(y,
, . . . ,y4) in the quarterfinals; etc. (Another
proof shows that the first round has 2n!/2n-1! essentially different outcomes;
the second round has 2np1!/2np2!; and so on.) (e) Let
Sk
be the set of
2kp1
potential opponents of x2 in the kth round. The conditional probability that
x2 Wins, given that
xl
belongs to Sk, is
Pr(xl
plays x2)
.pn-’
(1
-p)
+
Pr(xl
doesn’t play x2) .p”
= pkP’pnP’(l -p) + (1
-pypn.
The chance that
x1
E
Sk
is 2kp1/(2n
-
1); summing on k gives the answer:
t
*;,
-$&(pk-‘p+p)
+
(I-pk-‘)pn)
=
pn
-
(‘,“n’“,
pnp’
.
(f) Each of the
2”!
tournament outcomes has a certain probability of occur-
ring, and the probability that
xj
wins is the sum of these probabilities over
all (2”
-
1 )! tournament outcomes in which xj is champion. Consider inter-
changing xj with xj+l in all those outcomes; this change doesn’t
affect
the
560 ANSWERS TO EXERCISES
probability if xi and
xi+1
never meet, but it multiplies the probability by
(1
-
p)/p
<
1
if they do meet.
8.45 (a) A(z) =
l/(3
-
22);
B(z)
= zA(z)‘; C(t) = z’A(z)~. The pgf for
sherry when it’s bottled is
ZEAL,
which is
z3
times a negative binomial
distribution with parameters n = 3, p =
f.
(b) Mean(A) = 2, Var(A) = 6;
Mean(B) = 5, Var(B) = 2Var(A) = 12; Mean(C) = 8, Var(C) = 18. The
sherry is nine years old, on the average. The fraction that’s 25 years old is
(,;)
(-21L23
25 = (ii)2223
25
= 23.
(324
z
.00137. (c) Let the coefficient of
w” be the pgf for the beginning of year n. Then
A = (1
+
;w/(l
-w))/(l
-
{zw);
B = (l+ ;zwA)/(l-
$zw);
C = (1 +
;zwB)/(l
-
$zw).
Differentiate with respect to z and set z = 1; this makes
a
C’ =
~-
l/2
3/2
6
~-
~
l-w
~ijw!3-(l~~w)2
l-iw’
The average age of bottled sherry n years after the process started is 1 greater
than the coefficient of w”~‘, namely
9-(
f)“(3n2+21n+72)/8.
(This already
exceeds 8 when n = 11.)
8.46
(a)
P(w,z) = 1 +
i(wP(w,z)
+
zP(w,z)) = (1
-
i(w
+
z))~',
hence
P
mn =
2-
“J
“(“,‘“).
(b) P
k
w,z) =
i(w"
+zk)P(w,z);
hence
(
Pk
,m,n
=
2k-’ In n
,(,,,,)
+
(‘“in”-k)).
(C)
tkkPk,n.TI
=
EL=,
k2km2n(2”;k) = ,YzYo(n
-
k)2-
n
k(“zk); this can be
summed using (5.20):
g2mnmk(12n+ll(nn+k)
-(n+l)(nn+:Tk))
= (2n+
l)-(n+1)2-"
(
2
n+l-2-n-l
2n+2
(
1)
n+l
2n+l
2n
-,
=-
(
>
22n
n
(The methods of Chapter 9 show that this is
2m
-
1 + O(nP’/2).)
8.47 After n irradiations there are n + 2 equally likely receptors. Let the
random variable X, denote the number of diphages present; then
Xn+l
=
A ANSWERS TO EXERCISES
561
X, + Y,, where
Y,,
= -1 if the (n +
1)st
particle hits a diphage receptor
(conditional probability
2X,/(n
+ 2)) and
Y,,
= +2 otherwise. Hence
EX,,+I
= EX,, + EY, = EX, ~
2EX,/(n+2)
+ 2(1 ~
2EX,/(n+2))
.
The recurrence
(n+2)EX,+l
=
(n-4)EX,+2n+4
can be solved if we multiply
both sides by the summation factor (n +
1)“;
or we can guess the answer and
prove it by induction: EX, = (2n + 4)/7 for all n > 4. (Incidentally, there
are always two diphages and one triphage after five steps, regardless of the
configuration after four.)
8.48 (a) The distance between frisbees (measured so as to make it an even
number) is either 0, 2, or 4 units, initially 4. The corresponding generating
functions A, B, C (where, say,
[z”]
C is the probability of distance 4 after n
throws) satisfy
A = $zB, B =
;zB++zC,
C = 1 + ;zB + +zC
It follows that A = z2/(16
-
202 + 5z2) = t’/F(z), and we have Mean(A) =
2
-
Mean(F) = 12, Var(A) =
-
Var( F) = 100. (A more difficult but more
amusing solution factors A as follows:
A=
PlZ P2Z P2 PlZ
+
Pl P2Z
-.-=
1-q1z
l-q2z
Pz-Pl
l-412
Pl
-
P2
1
-
q2=
where
p1
= a2/4 = (3 +
&i/8,
p2
=
$j2/4
= (3
-
&)/8,
and
p1
+
q1
=
p2
+
qr
=
1.
Thus, the game is equivalent to having two biased coins whose
heads probabilities are
p1
and ~2; flip the coins one at a time until they
have both come up heads, and the total number of flips will have the same
distribution as the number of frisbee throws. The mean and variance of the
waiting times for these two coins are respectively 6 F
2fi
and 50 F
22fi,
hence the total mean and variance are 12 and 100 as before.)
(b) Expanding the generating function in partial fractions makes it
possible to sum the probabilities. (Note that
d/(4@)
+ a2/4 = 1, so the
answer can be stated in terms of powers of
4.)
The game will last more than
n steps with probability
5inp11/24pn
( $nf2
-
@m”p2);
when n is even this is
5"124
nF,+Z.
So the answer is
5504
1ooF,02
G
.00006.
8.49 (a) If n > 0, PN(O,n) =
i[N=Ol
+ ~PN j(O,n) +
iPN~-i(l,n-1);
PN
(m, 0) is similar;
PN
(0,O)
=
[N
=
01.
Hence
gm,n = $=gm l,n+l + ;=%n,n +
&ll+1,n
1
;
90
.n
=
i
+
izgg,,
+
igl,,-1
;
etc.
64
gk,., =
l-t&,
l,n+l+tg~,,+~g~sl,n~~l;
sh,, =
~+$&,+-&&
+?tc.
By induction on m, we have
gi,,
= (2m +
l)gh,,,+,,
-
2m2 for all
m,n
3 0.
562 ANSWERS TO EXERCISES
And since
gk,O
=
91,
mr
we must have gk,,, =
m+n+2mn.
(c) The recurrence
is satisfied when mn > 0, because
sin(2mf
l)e
=
&%
(
sin(2m-
l)tl
4
+
sin(2mt
l)G
+
sin(2m+3)8
.
2
1
4
this is a consequence of the identity sinjx
-
y) i- sinjx + y) = 2sin x
cosy.
So
all that remains is to check
t.he
boundary conditions.
8.50 (a) Using the hint, we get
3(1
42;
(‘f)
(;z)Yl
-Z)2
k
=
3(1
-z)2F
(‘f)
(i)kF
(ktjp3)zj+k;
now look at the coefficient of z3+‘. (b) H(z)
==
$
+ &z +
i
J&
c~+~z~+~.
(c) Let
r
=
J(1
-2)(9-z).
0
ne can show that
(z-3+r)(z-3-r)
~42,
and hence that
(r/(1
-2)+2)’
=
(13-5z+4r)/(l-z)
=
(9-H(z))/(l -H(z)).
(d) Evaluating the first derivative at z = 1 shows that Mean(H) =
1.
The
second derivative diverges at z =
1,
so the variance is infinite.
8.51 (a) Let H,(z) be the pgf for your holdings after n rounds of play, with
Ho(z) = z. The distribution for n rounds is
H,,+liz)
=
H,(W)
,
so the result is true by induction (using the amazing identity of the preceding
problem). (b)
g,,
=H,(O)-H,
l(O)
=4/n(n$-l)(n+2)
=4(n-l)A.
The
mean is 2, and the variance is infinite. (c) The expected number of tickets you
buy on the nth round is Mean(H,,) =
1,
by exercise 15. So the total expected
number of tickets is infinite. (Thus, you almost surely lose eventually, and you
expect to lose after the second game, yet you also expect to buy an infinite
number of tickets.) (d) Now the pgf after n games is
H,(z)‘,
and the method
of part (b) yields a mean of 16
-
4x2
z
2.8. (The sum
t,,,
l/k2
=
7rL/6
shows up here.)
8.52 If w and w’ are events with Pr(w) >
Pr(w’),
then a sequence of
n independent experiments
.will
encounter
cu
more often than w’, with high
probability, because w will occur very nearly
nPr(w)
times. Consequently,
as n
+
co,
the probability approaches 1 that the median or mode of the
A ANSWERS TO EXERCISES 563
values of X in a sequence of independent trials will be a median or mode of
the random variable X.
8.53 We can disprove the statement, even in the special case that each
variableisOor1.
LetpO=Pr(X=Y=Z=O),pl=Pr(X=Y=Z=O),...,
p7=Pr(X=Y=Z=O),whereX=l-X.
Thenpo+pl+...+p7=1,and
the variables are independent in pairs if and only if we have
(p4+p5+p6+p7)(pL+p3+p6+p7)
=
p6+p7,
(p4
+
p5
+ p6 -+
p7)(pl
+ p3 + p5 +
p7)
=
p5
+
p7,
(p2
+ p3
+
p6
-t
p7)tpl
+ p3 +
p5
+
p7)
= p3 +
p7.
But
WX+Y=Z=O)
#
Pr(X+Y=O)Pr(Z=O)
w
p0
#
(pO
+p,)(pc
+
pr
+ p4 +
p61.
One solution is
PO = P3
=
Ps
=
P6 =
l/4;
p1 =
p2
= p4
=
p7
=
0.
This is equivalent to flipping two fair coins and letting X = (the first coin
is heads), Y = (the second coin is heads), Z = (the coins differ). Another
example, with all probabilities nonzero, is
PO = 4/64, PI = ~2
=
~4 =
5/64,
p3
=
p5
=
p6 = 10/64,
p7
= 15/64.
For this reason we say that n variables XI, , X, are independent if
Pr(X1
=x1
and...and
Xn=x,)
= Pr(X,
=xl)...Pr(X,
= x,);
pairwise independence isn’t enough to guarantee this.
8.54 (See exercise 27 for notation.) We have
E(t:)
= nll4
+n(n-1)~:;
E(LzLfI
=
np4
+2n(n-l)u3pl
+n(n-1)~:
+n(n-l)(n-2)p2&;
E(xy) = np4
+4n(n-l)p~u1
+3n(n-1)~:
+
6n(n-l)(n-2)u2p:
+
n(n-l)(n-2)(np3)pT
;
it follows that V(\iX) = K4/n +
2K:/(n
~ 1).
8.55 There are A q = & .52! permutations with X = Y, and B =
g
.52!
permutations with X # Y. After the stated procedure, each permutation
with X = Y occurs with probability
A/((
1
-
gp)A),
because we return
to step
Sl
with probability
$p.
Similarly, each permutation with X # Y
occurs with probability
g(l
-
p)/((l ~
sp)B).
Choosing p =
i
makes
Pr
(X
= x and Y = 9) =
&
for all x and
y
(We could therefore make two flips
of a fair coin and go back to
Sl
if both come up heads.)
564 ANSWERS TO EXERCISES
8.56 If m is even, the frisbees always stay an odd distance apart and the
game lasts forever. If m =
i:l.
+ 1, the relevant generating functions are
(The coefficient
[z”]
Ak
is the probability that the distance between frisbees
is 2k after n throws.) Taking a clue from the similar equations in exercise 49,
we set
z
= 1 /cos’
8
and
Al
:=
X sin28, where X is to be determined. It follows
by induction (not using the equation for Al) that
Ak
= X sin2kO. Therefore
we want to choose X such that
(
3
1-p
4cos28
>
X sin;!10 = 1 +
&
X
sin(21-
218.
It turns out that X = 2
cos’
O/sin
8
cos(21+ 1
)O,
hence
cos e
G,
=
-
cos
me
The denominator vanishes when
8
is an odd multiple of
n/(2m);
thus 1
-qkz
is
a root of the denominator for 1 6 k 6
1,
and the stated product representation
must hold. To find the mean and variance we can write
Trigonometry wins
again.
is
there a
G,
=
(1
-
$2
+
L.04
-
. .
)/(I
-
$2@
+
&m4@4
-
. . . )
connection with
pitching pennies
along the angles of
the m-gon?
= 1 +
i(m2
-
1)02
+ &(5m4
-6m2+
1)04
+...
= 1
+~(m2-l)(tanB)2+~(5m4-14m2+9)(tan8)4+~~~
= 1 +
G:,(l)(tan8)2
+
iGK(1)(tan8)4
+...
,
because tan28 =
Z-
1 and tan8
=O+
i03
+....
So we have
Mean
=
i(m2-1)
andVar(G,)
=
im2(m2-1).
(Note that thisimplies theidentities
m2
-
1
(m-1
l/2
~
=
2
&
i
=
'mjf'2(l/sin
"",r,,')")';
k=l
m2(m2
-
1)
lm
Ii/2
-
6
=
u
cot
(2k-
1)~
(2k-
1)n
2
2m
I
sin
>
2m
k=l
The third cumulant of this distribution is &m2(m2
-
l)(4m2
-
1); but the
pattern of nice cumulant factorizations stops there. There’s a much simpler
A ANSWERS TO EXERCISES 565
way to derive the mean: We have
G,
+
Al
+.
+
Ar
= z(Ar
+.
. . +
AL)
+
1,
hence when
z
= 1 we have
Gk
=
Al
+.
+ Al. Since
G,
= 1 when
z
= 1, an
easy induction shows that
Ak
= 4k.)
8.57 We have A:A 3
2’
and B:B < 2l
+ 2l
3
and B:A 3 2’
2,
hence
13:B
-
B:A 3 A:A
-
A:B is possible only if A:B >
21p3.
This means that
52
=
~3,
~1
=
~4,
~2
=
‘~5,
. , rr
3
= rr. But then A:A
zz
2’
+
2’
4
+
..I
A:B
z
2’
3
+
2’
6
+.
, B:A
z
2’
+
2’
5
+.
. . , and B:B
z
2’
-’
+
2’
4
+ . . .
;
hence B:B
-
B:A is less than A:A
-
A:B after all. (Sharper results have
been obtained by Guibas and Odlyzko
[138],
who show that Bill’s chances are
always maximized with one of the two patterns
H-r1
. . .
rl
I
or Trl
rl
,
.)
8.58 According to
r(8.82),
we want B:B
-
B:A > A:A
-
A:B. One solution is
A = TTHH, B = HHH.
8.59 (a) Two cases arise depending on whether
hk
# h, or
hk
= h,:
m-l
m-2+w+z
k-1
G(w,z)
=
---(
>
(
m-l+2
nmk
>
l
1
+-
m
(
rn-cwzJk
lwZ~m.:;S
k-lZ.z
(b) We can either argue algebraically, taking partial derivatives of G (w, z)
with respect to w and
z
and setting w =
z
= 1; or we can argue com-
binatorially: Whatever the values of
hl,
. . . , h,
-1,
the expected value of
P(hl , . , h,
1,
h,;
n) is the same (averaged over h,), because the hash se-
quence
(hr
, . . . , h, 1
)
determines a sequence of list sizes (nl , n2,. . , n,) such
that the stated expected value is
((nr+l)
+
(nz+l)
+
...
+
(n,+l))/m
=
(n
-
1 +
m)/m.
Therefore the random variable EP(
hl
, . . . , h,,; n) is indepen-
dent of (hl , . , h, I), hence independent of P(
hr
, . . ,
h,;
k).
8.60 If 1 6 k <
1
:$
n, the previous exercise shows that the coefficient of
sksr
in the variance of the average is zero. Therefore we need only consider
the coefficient of si, which is
t
Pih,,...,h,;k)2
-’
-(
t
1Sh1
,...I
h,,Sm
mn
l<h,
,...I
h,,<m
the variance of ((m
-
1 +
z)/m)
k~’
z; and this is (k
-
l)(m
-
1)/m’
as in
exercise 30.
8.61 The pgf D,(z) satisfies the recurrence
Do(z) =
z;
b(z)
= z2Dn
I
(2)
+
2(1
-
z3)Dk
-,
(z)/(n + 1))
for n > 0.
566 ANSWERS TO EXERCISES
We can now derive the recurrence
D:(l)
=
(n-
ll)D,!P,(l)/(n+
1) +
(8n-2)/7,
which has the solution
&
(n+2)
(26n+ 15) for all n 3 11 (regardless of initial
conditions). Hence the variance comes to
g
(n + 2)(212n + 123) for n 3
11.
8.62 (Another question asks if a given sequence of purported cumulants
comes from any distribution whatever; for example, ~2 must be nonnegative,
and ~4 +
3~:
= E((X
-
~1~)
must be at least (E((X
-
FL)‘))’ = K:, etc.
A necessary and sufficient condition for this other problem was found by
Hamburger
[6],
[144].)
8.63 (Another question asks if there is a simple rule to tell whether H or T
is preferable.) Conway conjectures that no such ties exist, and moreover that
there is only one cycle in the directed graph on
2’
vertices that has an arc
from each sequence to its “best beater!’
9.1 True if the functions are all positive. But otherwise we might have,
say, fl (n) =
n3
+ n2, fz(n) =
-n3,
g1
(n) =
n4
+ n, g2(n) =
-n4.
9.2 (a) We have nlnn 4
c”
4 (Inn)“, since (lnn)2 +
nlnc
4 nlnlnn.
(b)
nlnlnlnn
4 (Inn)! + nlnlnn. (c) Take logarithms to show that (n!)! wins.
(4
‘$,,,
=:
4
2lnn
=
,2lnl$.
HF
,
,,-nln$winsbecause@‘=@+l
<e.
9.3 Replacing kn by 0 (n) requires a different C for each k; but each 0
stands for a single C. In fact, the context of this 0 requires it to stand for
a set of functions of two variables k and n. It would be correct to write
,Tc=, kn =
EL=,
O(n2) = O(n3).
9.4
For example, limn+03
0(1/n)
= 0. On the left,
0(1/n)
is the set of all
functions f(n) such that there are constants C and
no
with If(n)1 < C/n for
all n 3
no.
The limit of all functions in that set is 0, so the left-hand side is
the singleton set {O}. On the right, there are no variables; 0 represents {0}, the
(singleton) set of all
“f
unctions of no variables, whose value is zero!’ (Can you
see the inherent logic here? If not, come back to it next year; you probably
can still manipulate O-notation even if you can’t shape your intuitions into
rigorous formalisms.)
9.5 Let f(n) =
n2
and g(n) = 1; then n is in the left set but not in the
right, so the statement is
fa.lse.
9.6
nlnn+yn+O(filnn).
9.7 (1
-em’/n)P’
=nBo-B1
+B2n~~‘/2!+~.~=n+~+O(n
‘).
9.8 For example, let f(n) =
[n/2]!’
+n,
g(n) =
([n/2]
-
l)! [n/2]!
+n.
These functions, incidentally, satisfy f(n) = O(ng(n)) and g(n) =
O(nf(n));
more extreme examples are clearly possible.
A ANSWERS TO EXERCISES 567
9.9 (For completeness, we assume that there is a side condition n
+
00,
so that two constants are implied by each 0.) Every function on the left has
the form a(n) + b(n), where there exist constants
Q,
B, no, C such that
la(n)/ 6 Blf(n)[ for n 3 mc and [b(n)1 6 Clg(n)l for n 3 no. Therefore the
left-handfunctionisatmostmax(B,C)(lf(n)l+Ig(n)l),forn3max(~,no),
so it is a member of the right side.
9.10 If g(x) belongs to the left, so that g(x) = cosy for some y, where
Iy/
< Clxl for some C, then 0 6 1
-
g(x) = 2sin2(y/2) <
$y2
6 iC2x2; hence
the set on the left is contained in the set on the right, and the formula is true.
9.11
The proposition is true. For if, say,
1x1
<
/yI,
we have (x +
Y)~
6 4y2.
Thus
(x+Y)~
=
0(x2)
+O(y’).
Thus
O(x+y)’
=
O((x+y)‘)
=
0(0(x2)
+
O(y2)) =
0(0(x2))
-t O(O(y2)) =
0(x2)
+ O(y2).
9.12 1 +2/n + O(nP2) = (1 +
2/n)(l
+ O(nP2)/(1 +2/n)) by (g.26), and
l/(1
+2/n) = O(1); now use (9.26).
9.13 n”(1 + 2nP’ + O(nP2))” =
nnexp(n(2n-’
+ O(nP2))) = e2nn +
O(n”-‘).
9.14
It is
nn+Pexp((n+
@)(ol/n- ta2/n2
+O(ne3)))
9.15
In (n2n) =
3nln3
-
1
the answer
is‘
nn+tln3-ln2n+
(+f)n-’
+O(nP3),
so
=(I
-
5n-l
+
82jnp2
+
o(n-3)).
9.16
If
1
is any integer in the range a 6
1
< b we have
1
1
B(x)f(l+x)
dx =
B(x)f(l+x)
dx-
0
l/2
s
l/2
B(l
-x)f(l+x)dx
0
=s
1
B(x)(f(l+x)
-f(l+
1
-x))
dx.
l/2
Since
1
+ x >
1
+ 1
-
x when x 3
i,
this integral is positive when f(x) is
nondecreasing.
9.17
L>O
B,(i)z."'/m!
= ~e~'~/(e~-l) =
z/(eZ/2-1)-z/(e"-1)
9.18
The text’s derivation for the case
OL
= 1 generalizes to give
2(2n+1/2)a
bk(n) = --e
-k’a/n
(27rn)"/2
'
ck(n) = 22nan
-(l+cx)/2+3ykb./n.
I
the answer is 22na(~n)i’~a1’20L~1’2(1 + O(n-1/2+36)).
568 ANSWERS TO EXERCISES
9.19
Hlo = 2.928968254
z
2.928968256;
lo!
=I
3628800
z
3628712.4;
B,,.,
=
0.075757576
z
0.075757494;
n(
10) = 4
z
10.0017845;
e".'
= 1.10517092
z
1.10517083;ln1.1 = 0.0953102
z
0.0953083; 1.1111111
z
1.1111000~
l.l@.'
=
1.00957658
z
1.00957643. (The approximation to n(n) gives more significant
figures when
n
is larger; for example,
rc(
1
09)
= 50847534
zz
50840742.)
9.20 (a) Yes; the left side is o(n) while the right side is equivalent to O(n).
(b) Yes; the left side is e. eoi’/ni. (c) No; the left side is about
J;;
times the
bound on the right.
9.21 WehaveP,=m=n(lnm-1
-l/lnm+O(l/logn)2),
where
lnm = lnn+lnlnm-
l/lnn+lnlnn/(lnn)2
+O(l/logn)2;
lnlnn
(lnlnn)’
lnlnm =
1nlnn-t
-In
-
lnlnn
2(lnn)2
+-
(lnn)2
+
O(l/logn)‘.
It follows that
P,
= n
(
lnn+lnlnn-1
lnlnn-2
+
t(lnlnn)’
-
31nlnn
---
hi
n
(lnn)2
+
O(l/logn)’
.
)
(A slightly better approximation replaces this 0(
l/logn)’
by the quantity
-5/(lnn)’
+ O(loglogn/logn)3; then we estimate
P~OOOOOO
z
15483612.4.)
9.22 Replace
O(nzk)
by --&npLk +
O(n
4k)
in the expansion of
H,r;
this
replaces
O(t3(n2))
by
-h.E3(n2)
+ O(E:3(n4)) in (9.53). We have
,X3(n)
= ii-i-
+
&n,F2
+ O(np3),
hence the term
O(n2)
in
($1.54)
can be replaced by
-gnp2
+ O(n 3).
g.23 nhn =
toskcn
hk/(n~-k)
+ZcH,/(n+
l)(n+2).
Choose c =
enL/6
=
tkaogk
so
that
tka0
hk
:=
0 and h, = O(log n)/n3. The expansion of
t
OSk<n
hk/(n
-
k) as in (9.60) now yields nh, =
ZcH,/(n
+
l)(n
+ 2) +
O(n
m2),
hence
9
n=
en~/6
n+2lnn+O(l)
(
.n3
9.24
(a)
If
,&o(f(k)
1
< co and if
f(n
-
k)
=.
O(f(n)) when 0 6 k < n/2,
we have
L
akhk
=
r
O(f(k))O(f(n))
+
f
O(f(n))O(f(n
-
k)) ,
k=O k=O
k=n/2
A ANSWERS TO EXERCISES 569
which
is
2O(f(n)
tkzO
If(k)/),
so this case is proved. (b) But in this case if
a - b, = aPn, the convolution (n + 1 )aPn is not 0(
01
“).
n-
9.25
s,/(3t)
=
~;4Lq2n+l)F
w
e
may restrict the range of summation
to 0 < k 6 (logn)‘, say. In this range nk =
nk(l
-
(i)/n
+ O(k4/n2)) and
(2n +
l)k
= (2n)k(l +
(“;‘)/2n+
O(k4/n2)), so the summand is
Hence the sum over k is 2 -4/n + 0( 1 /n2). Stirling’s approximation can now
be applied to
(y)
= (3n)!/(2n)!n!, proving (9.2).
9.26 The minimum occurs at a term
Blm/(2m)
(2m-
1 )n2”-’ where 2m
z
2rrn + 3, and this term is approximately equal to 1
/(rceZnnfi
).
The absolute
error in Inn! is therefore too large to determine n! exactly by rounding to an
integer, when n is greater than about
e2n+‘.
9.27 We may assume that a #
-
f
n”+l
km
=
C,+
-
k=l
a+1
1.
Let f(x) = x”; the answer is
na-2k+l
+
0~~”
-2m
1).
(The constant
C,
turns out to be
<(-a),
which is in fact defined by this
formula when a > -1.)
9.28 Take f(x) = xlnx in Euler’s summation formula to get
A.
nnL:2+n/:+1/12e~n~i4(1
+
qn-2))
,
where A
z
1.282427 is “Glaisher’s constant!’
9.29 Let f(x) =
xP1
lnx. Then fiZmi (x) > 0 for all large x, and we can write
n
Ink
ET
=
y+lnS+z+Bn+,
0<8,<1,
k=l
where S
z
0.929772 is constant. Taking exponentials gives
(In general if f(x) =
X~
lnx, Euler’s summation formula applies as in exer-
cise 27, and the resulting constant is -<‘(-a) if a # -1. Thus, the theory of
the zeta function gives a closed form for Glaisher’s constant in the previous
exercise. We have
1nS
= yi in the notation of answer 9.57.)
570 ANSWERS TO EXERCIS:ES
9.30 Let g(x) =
xLePxL
and. f(x) =
g(x/fi).
Then n “’
,Yk>O
k’ePkz”’ is
,
.I
cc
f(x)
dx
-
f
%‘kP”(q
-
(-1
)-I
Oc’
h&4)
0
k=l
k!
0
,fl"'(x)
dx
=n
l/2
g(x)
dx
-
c
E!Lnlk~l)i2gik-l1(0) + 0(~-m/2).
k=,
k!
Since g(x) = x1
-
x2+‘/l ! + x4 ‘l/2!
-
x6+‘/3!
+.
. , the derivatives g imi (x) obey
a simple pattern, and the answer is
1,it+l)/2
r
1
+
(
>
Bt+l
b+3np’
Bt+6
2
-
-
2 2
(1+1)!0!
+
(l+3)!1!
-
(1+5)!2!
+Obp3)
9.31 The somewhat surprising identity
l/(cmmmk
+
cm)
+
l/(~"'+~
+
cm)
=
1 /cm makes the terms for 0 < k 6 2m sum to (m + +)/cm. The remaining
terms are
1 1
=-
C2m+l
_
C2m
-
C3m+2
_
C3m
+...
)
and this series can be truncated at any desired point, with an error not ex-
ceeding
the first omitted term.
9.32
H:)
=
x2/6
-
l/n + O(nP2) by Euler’s summation formula, since we
know the constant; and
H,
is given by (9.89). So the answer is
The world’s top
three constants,
ney+nL’6
1 -
in-’
+
O(n-‘))
.
(
(e,
n,
y),
all
appear
in this answer.
9.33
Wehavenk/n’=
l-k.(k-l)nP’+~k2(k--l)2n~2+0(k6nP3);
dividing
by k! and summing over k 3 0 yields e
-
en-’
$-
I
en-
2
+ 0 ( nP3 ) .
9.34 A =
ey;
B = 0; C =
-.ie’;
D =
ieY(l
-y);
E = :eY; F =
&eY(3v+l).
9.35 Since l/k(lnk+
O(l])
=
l/kink+
O(l/k(logk)2),
the given sum
is Et==, 1
/kink
+ 0( 1).
The remaining sum is In Inn + 0( 1) by Euler's
summation formula.
9.36 This works out beautifully with Euler’s summation formula:
dx
+L--
1
n
B2
-2x
n
n2
+ x2 2
n2
+ x2
o
+?(n2+x2)2
o
+ O(nm5)
A ANSWERS TO EXERCISES 571
Hence
S,
=
a7m-l
-- inP2
-
An3
+ O(nP5).
9.37 This is
k,q>l
=
n2-
1)
= ,2
-.
The remaining sum is like (9.55) but without the factor u(q). The same
method works here as it did there, but we get L(2) in place of
l/<(2),
SO the
answer comes to (1
-
g)nZ
+ O(nlogn).
9.38 Replace k by n
-
k and let ok(n) = (n
-
k)nPk(f;). Then In ok(n) =
nlnn
-
Ink!
-
k +
O(kn’),
and we can use tail-exchange with bk(n) =
nnePk/k!, ck(n) = kbk(n)/n,
D,
= {k 1 k < lnu}, to get
I&
ok(n) =
nne’/e(l +
O(n’)).
9.39 Tail-exchange with bk(n) = (Inn
-
k/n
-
ik2/n2)(lnn)k/k!,
ck(n) =
n3
(In n) k+3/k!,
D,
={k 1 0 < k <
10lnn).
When k x 1Olnn we have
k! x
fi(lO/e)k(lnn)k,
so the kth term is
O(n-
101n(lO/e)
logn). The answer
is nlnn-lnn-
t(lnn)(l
+lnn)/n+O(n~2(logn)3).
9.40 Combining terms two by two, we find that
H&-(H2k-&)m
= EHykP’
plus terms whose sum over all k > 1 is 0 (1). Suppose n is even. Euler’s
summation formula implies that
hence the sum is
i
H,”
+ 0 (1). In general the answer is
5
(-
9.41 Let
CX=
$/L$ = -@-2. We have
(In eYn)m
+0(l)
m
-l)nH,m -t O(1).
ClnFk
=
~(h~k-h&+h(l
-ak))
k=l
n(n
+ 1)
z
2
In@-5ln5+tln(l
-ak)-xln(l
-elk).
k21
k>n
The latter sum is tIk>,,
O(K~) =
O(~L~).
Hence the answer is
@+1/25-Wc
+
o&n’”
31/+-n/Z) ,
where
C = (1
-a)(1
-~~)(l
-K~)...
zz
1.226742.
572 ANSWERS TO EXERCISES
9.42 The hint follows since
(,“,)/(z)
=
&
$
a
<
&.
Let
m =
lcxn]
=
om
~
E.
Then
n
<
(
>(
m
1+i~+(&)2+...)
=
(;)S.
so
1
ksa,,
(;)
=
(:)0(l),
.d
t
an i remains to estimate
(z).
By Stirling’s ap-
proximation we have In
(z)
=I
-i
1
nn-(an-e)ln(K-e/n)-((l--0()n+c)
x
ln(l-cx+c/n)+0(1)=-~lnn-omlna-(1-ol)nIn(l-cx)+0(1).
9.43 The denominator has factors of the form
z
-
w, where w is a complex
root of unity. Only the factor
z
-
1 occurs with multiplicity 5. Therefore
by (7.31), only one of the roots has a coefficient
n(n4),
and the coefficient is
c =5/(5!~1~5~10~25~50)=1/1500000.
9.44 Stirling’s approximation says that
ln(xP”x!/(x-a)!)
has an asymptotic
series
-a-(x+i-a)ln(l-a/x)-&(x
‘-(x-o())‘)
-
&(x
3
-
(x
-
cc)
“)
-’
in which each coefficient of
xm~k
is a polynomial in
(x.
Hence x
“x!/(x
-
CX)!
=
Co(R)
+c1(a)x
+
...
+ c,(tx)xpn +
0(x-”
‘) as x
+
03, where c,,(a) is a
polynomial in
01.
We know that
c,
(
LX)
=
[,*,I
(-1)" whenever
01
is an integer,
and
LA1
is a polynomial in
01
of degree 2n; hence
c,
(
CX)
= [
&*,,I
(-1)” for
all real
01.
In other words, the asymptotic formulas
generalize equations (6.13) and
(6.11),
which hold in the all-integer case.
9.45 Let the partial quotients of LX be
(a,,
al,.
. . ), and let
cc,,,
be the con-
tinued fraction
l/(a,
+
CX,,~,)
for m 3 1. Then D(cx,n) = D(cxl,n) <
D(olr, LarnJ) +
al
+3
<
D(tx3,
LcxzlcxlnJj) + al +
a2
$6
<
...
<
D(Lx,+I,
~~m~...~~,n~...~~)+a~+~..+a,+3m<oll...cx,n+al+...+a,+3m,
for all m. Divide by n and let n
+
co;
the limit is less than
011
. . .
CX,
for
all m.
Finally we have
1
1
011
.
.a,
=
9.46 For
convenien.ce
we write just m instead of m(n). By Stirling’s ap-
A ANSWERS TO EXERCISES 573
proximation, the maximum value of
k:/k!
occurs when k
z
m
z
n/inn,
so
we replace k by m + k and find that
ln
Cm+
kin
In 27rm
(m-t
k)!
:=
nlnm-mlnmfm-P
2
(m+n)k2
2m2
+ O(k3m ‘logn)
Actually we want to replace k by
[ml
+ k; this adds a further 0 (km
log n).
The tail-exchange method with
lkl
< m’/2+E now allows us to sum on k,
A truly
Be/l-shaped
giving a fairly sharp asymptotic estimate
summand.
b,
= --
The requested formula follows, with relative error 0 (log log n/log n).
9.47
Letlog,n=l+El,whereO$8<1.
Thefloorsumisl(n+l)+l-
(ml+’
-
l)/(m
-
1):.
the ceiling sum is
(L
+ 1)n
-
(ml+’
-
l)/(m
-
1); the
exact sum is (1+
0)n
~
n/in
m + O(log n). Ignoring terms that are o(n), the
difference between ceiling and exact is ( 1
-
f (0)) n, and the difference between
exact and floor is f(O)n, where
f(e)
=
1
J!&Y+e----.
lnm
This function has m,aximum value f (0) = f (1) = m/( m
-
1)
-
1
/In
m, and its
minimum value is
lnlnm/lnm
+ 1
-
(ln(m
-
l))/ln m. The ceiling value is
closer when n is nearly a power of
m,
but the floor value is closer when
8
lies
somewhere between 0 and 1.
9.48 Let
dk
=
ok
+ bk, where
ok
counts digits to the left of the decimal
point. Then
ok
= 1 +
Llog
Hk] = log log k + 0( 1 ), where ‘log’ denotes
loglo.
To estimate bk, let us look at the number of decimal places necessary to
distinguish y from nearby numbers y --
e
and y + E’: Let 6 = 10
'
be the
574 ANSWERS TO EXERCISIES
length of the interval of numbers that round to 0. We have
/y
-01
6 id; also
y-e <
Q--i6
andy+c’
> Q-t-:8. Therefore e+c’ > 6. Andif 6 < min(e, E’),
the rounding does distinguish ij from both y
-
e
and y + 6’. Hence
10Ph”
<
l/(k-l)+l/kand
10IPbk
3 l/k; we have
bk
= log k+O(l). Finally, therefore,
Et=,
dk
=
,&
(logk+loglogk+O(l)), which is nlogn+nloglogn+O(n)
by Euler’s summation formula.
9.49 We have
H,
>
lnn+y+
in-’
-
&nP2 = f(n), where f(x) is increasing
for all x > 0; hence if n 3 ea
Y
we have
H,
3 f(e”-Y) >
K.
Also
H,-,
<
Inn + y
-
in--’ =
g(n), where g(x) is increasing for all x > 0; hence if
n 6 eaPy we have
H,-l
$ g(e”--Y)
<
01.
Therefore
H,-r
<
OL
6
H,
implies
that
eaPv+l
>n>ea+Y-l.
(Sharper results have been obtained by Boas
and Wrench [27].)
9.50 (a) The expected return is ,YlsksN
k/(k’HE’)
= HN/H~‘, and we
want the asymptotic value to
O(N-’
):
1nN
+y+O(N-‘)
6lnlO
6y 361n10 n
n2/6-N-l+O(N-2)
=
~n+~~+~~+o(lo-n)*
The coefficient (6 In 1
O)/n2
= 1.3998 says that we expect about 40% profit.
(b) The probability o:f profit is x,,<kGN
l/(k2Hc’)
= 1
-
Hf’/HE’,
and since
Hf)
=
$
-n-l
+
in-’
+ O(nm3) this is
n-’
-
in2
+O(nP3)
6
-,
3
~~
--n
n2/6+
O(N-1) = 7crn
+
2+0(nP3),
actually decreasing with n. (The expected value in (a) is high because it
includes payoffs so huge that the entire world’s economy would be affected if
they ever had to be made.)
9.51 Strictly speaking, this is false, since the function represented by O(xP2)
might not be integrable. (It might be
‘[x
E
S]/x”,
where S is not a measurable
set.) But if we stipulate that f(x) is an integrable function such that f(x) =
(As opposed to an
O(xm2) as x
+
00, then
IJ,“f(x)
dx( <
j,“lf(x)I
dx <
j,”
CxP2 dx =
Cn’.
execrable function.)
9.52 In fact, the stack of n’s can be replaced by any function f(n) that
approaches infinity, however fast. Define the sequence (TQ,
ml
,
ml,
. . . ) by
setting
rnc
= 0 and letting mk be the least integer > mk-1 such that
3 f(k+
1)‘.
Now let A(z) =
tk>,
(z/k)mk.
This power series converges for all z, because
the terms for k >
Iz/
are bounded by a geometric series. Also A(n + 1) 3
((n+
l)/n)“‘n 3
f(n+l)‘,
hence
lim,,,f(n)/A(n)
=O.
A ANSWERS TO EXERCISES 575
9.53 By induction, the 0 term is (m
-
l)!--’
s,”
tmP’f(“‘)(x
-
t) dt. Since
f(ln+‘) has the opposite sign to f
cm),
the absolute value of this integral is
bounded by If(“‘(O) 1
J,”
tm-’ dt; so the error is bounded by the absolute value
of the first discarded term.
9.54 Let g(x) =~f(x)/xrx. Then g’(x)
N
-oLg(x)/x as x
t
00. By the mean
Sounds like a nasty
value theorem, g(x
-
i)
-
g(x +
i)
= -g’(y)
-
ag(y)/y
for some y between
theorem.
x
-
i
and x +
i.
Now g(y) = g(x)(l +0(1/x)), so g(x
-
i)
-
g(x +
i)
-
ag(x)/x
=
af(x)/xlta.
Therefore
x
f(k)
~
=
k3n k’+”
(J(t(g&-
:I
-
g(k+
iI))
=
o(g(n-
:I).
k3n
9.55
The estimate of (n + k +
i)
ln(l + k/n) + (n
-
k +
i)
ln(1
-
k/n) is
extended to k2/n + k4/6n3 + O(nP3/2+5E), so we apparently want to have an
extra factor
ePk4/6n3
in bk(n), and ck(n) = 22nn-2+5eePk*/n. But it turns
out to be better to leave bk(n) untouched and to let
ck(n)
=
22nTL-2+5ce-kZ/n +
22nn-5+5~,&-kz/~,
thereby replacing e
-1c4/6n3
by 1
+
0 (
k4/n3
) .
The sum
1
k
k4
eP
k2/n
is 0 (
n512
) ,
as shown in exercise 30.
9.56 If k <
n’/‘+’
we have ln(nk/nk) =
-gk’/n
+
ik/n
-
ik3/n2 +
0 (n-
1+4E)
by Stirling’s app roximation, hence
nk/nk = ePkzi2n(l +
k/2n
-
$k3/(2n)2 +
O(nP”4’))
.
Summing with the identity in exercise 30, and remembering to omit the term
for k = 0, gives -1 + 01~ +
O:‘,’
-
$G:“,’
+ O(nP1/2+4’) =
m
-
5
+
O(n-
.
1/2+4e)
9.57 Using the
hini;,
the given sum becomes J,”
ueCU<(
1 +
u/inn)
du. The
zeta function can be defined by the series
<(l +
2)
=
C’
+
x
(-l)“r,z’“/m!
,
Ill>0
where
yo
= y and y,,, is the Stieltjes constant
Hence
the given sum is
576 ANSWERS TO EXERCISES
9.58
Let 0
<
8
6
1
and
f(z) =
e2xiro/(
eZnir
-
1).
We have
when xmod 1 = 4;
when
lyl
3
c.
Therefore
/f(z)1
is bounded on the contour, and the integral is O(Mlmm).
The residue of
2nif(z)/zm
at
z
= k # 0 is eznike/km; the residue at
z
= 0 is
the coefficient of
2-l
in
e2niz0
2rriz
Zm+l (Bo +
B1
T
$-
. .
>
27riz
=
&,(Wi
+W+
+-.)
,
namely
(2ti)“‘B,(O)/m!.
Therefore the sum of residues inside the contour is
m,B,(B)
+
2F
(27ri)m
enim/2
COS
(2nk6 --
nm/2)
kz=l
km
This equals the contour integral O(Mlpm), so it approaches zero as M -+ 00.
9.59 If
F(x)
is sufficiently well behaved, we have the general identity
x
F(k + t) =
t
G(2rm.)eZRint ,
k
n
where G(y) = ST,”
eciyXF(x)
dx. (This is “Poisson’s summation formula:
which can be found in standard texts such as Henrici (151, Theorem 10.6e].)
9.60 The stated formula is equivalent to
5 21
___-
+ 1024n3
32768n4
+
O(C5)
by exercise 5.22. Hence the result follows from exercises 6.64 and 9.44.
9.61 The idea is to make
cr
“almost” rational. Let ok =
22zk
be the kth
partial quotient of
01,
and let n = ;a,,,+, qm, where
qm
=
K(al,.
. . , a,) and
m is even. Then 0
<
{q,,,K}
<
l/Q(al,...,a,+.,)
< 1/(2n), and if we take
v =
a,,,+1
/(4n) we get a discrepancy 3
:a,+,
. If this were less than
n’-’
we
would have
E
%+1
=
WlAy),
but in fact a,+1 >
42,"
A ANSWERS TO EXERCISES 577
“The paradox
is now fully es-
tablished that
the utmost
abstractions are the
true weapons with
which to control
our thought of
concrete fact.”
-A. N.
White-
head [304]
9.62 See Canfield
[43];
see also David and Barton
[60,
Chapter
161
for asymp-
totics of Stirling numbers of both kinds.
9.63
Let c = a’-@. The estimate cn
a-‘+o(n@-‘)
was proved by Fine
[120].
Ilan Vardi observes
,that
the sharper estimate stated can be deduced from
the fact that the error term e(n) = f(n)
-
cn”-’ satisfies the approximate
recurrence c@n2-+e( n)
z
-
xk
e(k)[l
<kc
cn@P’].
The function
n+‘u(lnlnn/ln
4)
Inn
satisfies this recurrence asymptotically, if u(x + 1) = -u(x). (Vardi conjec-
tures that
f(n) =
nml(c+u(c)(lnn)-’
+O((logni’))
for some such function u.) Calculations for small n show that f(n) equals the
nearest integer to cn.+’ for 1 6 n < 400 except in one case: f(273) = 39 >
c.273‘+'
zz
38.4997.. But the small errors are eventually magnified, because
of results like those in exercise 2.36. For example, e(201636503)
M
35.73;
e(919986484788)
z
--1959.07.
9.64
(From this identity for Bz(x) we can easily derive the identity of exer-
cise 58 by induction on m.) If 0 < x < 1, the integral
si”
sin
Nti
dt/sin
ti
can be expressed as a sum of N integrals that are each 0 (N--2), so it is 0 (N -’
);
the constant implied by this 0 may depend on x. Integrating the identity
~:,N=lcos2n7rt=!.R(e2"it(e2N"it-l)/(e
2Rit-l))
=
-i+i
sin(2N+l)ti/sinrrt
and letting N
+
00
now gives xnB1
(sin
2nrrx)/n
=
5
-
XX,
a relation that
Euler knew ([85’] and
[88,
part 2,
$921).
Integrating again yields the desired
formula. (This solution was suggested by E. M. E. Wermuth; Euler’s original
derivation did not meet modern standards of rigor.)
9.65 The expected number of distinct elements in the sequence 1, f(l),
f(f(l)),
..*,
when f is a random mapping of
{1,2,.
. . , n} into itself, is the
function Q(n) of exercise 56, whose value is
i
&+O
(1); this might account
somehow for the factor v%%.
9.66
It is known that
lnx,,
N
in2 In 4; the constant
een/6
has been verified
empirically to eight significant digits.
9.67 This would fail if, for example, e
n-y =
m+
t
+ e/m for some integer m
and some 0 <
E
< f; but no counterexamples are known.
B
Bibliography
HERE ARE THE WORKS cited in this book. Numbers in the margin specify
the page numbers where citations occur.
“This
paper fills a
References to published problems are generally made to the places where
much-needed
gap
solutions can be found, instead of to the original problem statements, unless
in the literature.”
-Math. Reviews
no solution has yet appeared. in print.
1
2
3
4
5
6
7
8
578
N. H. Abel, letter to B. Holmboe (1823), in his
CEuvres
CompI&es, first
edition, 1839, volume 2, 264-265. Reprinted in the second edition, 1881,
volume 2, 254-255.
Milton Abramowitz and Irene A. Stegun, editors, Handbook of Math-
ematical Functions.
U:nited
States Government Printing Office, 1964.
Reprinted by Dover, 1965.
William W. Adams and J. L. Davison, “A remarkable class of continued
fractions,” Proceedings of the American Mathematical Society 65 (1977),
194-198.
A. V.
Aho
and N. J. A. Sloane, “Some doubly exponential sequences,”
Fibonacci Quarterly
11
(1973),
429-437.
W. Ahrens, Mathematische Unterhaltungen und Spiele. Teubner, Leip-
zig, 1901. Second edition, in two volumes, 1910 and 1918.
Naum Il’ich Akhiezer,
I<lassicheskal^a
Problema Momentov i Nekotorye
Voprosy Analiza, SvI2zannye
s
Nem Moscow, 1961. English translation,
The classical Moment P.roblem and Some Related Questions in Analysis,
Hafner, 1965.
R. E. Allardice and A.
Y’.
Fraser, “La Tour
d’Hanoi’,”
Proceedings of the
Edinburgh Mathematical Society 2 (1884), 50-53.
Desire Andre, ‘Sur les permutations alternees,” Journal de MathCma-
tiques pures et
appliquCes,
series 3, 7 (1881), 167-184.
603.
42.
604.
602.
8.
566.
2.
604.
215, 603.
515.
316.
604.
602.
13
429.
14
223, 603
15
602.
16
603.
17
602.
18
3i8.
19
604.
20
605.
21
269.
22
602.
23
42
24
9
10
11
12
B BIBLIOGRAPHY
579
George E. Andrews, “Applications of basic hypergeometric functions,”
SIAM Review 16 (1974), 441-484.
George E. And.rews, “On sorting two ordered sets,” Discrete Mathemat-
ics 11 (1975),
!>7-106.
George E. Andrews, The Theory of Partitions. Addison-Wesley, 1976.
George E. Andrews and K. Uchimura,
“Identities in combinatorics IV:
Differentiation and harmonic numbers,” Utilitas Mathematics 28 (1985),
265-269.
M.D. Atkinson,
“The cyclic towers of Hanoi,” Information Processing
Letters 13 (1981), 118-119.
Paul Bachmann, Die analytische Zahlentheorie. Teubner, Leipzig, 1894.
W. N. Bailey, Generalized Hypergeometric Series. Cambridge University
Press, 1935; second edition, 1964.
W. W. Rouse Ejall and H. S. M. Coxeter, Mathematical Recreations and
Essays, twelfth. edition. University of Toronto Press, 1974. (A revi-
sion of Ball’s Mathematical Recreations and Problems, first published
by Macmillan, 1892.)
P. Barlow, “Demonstration of a curious numerical proposition,” Journal
of Natural
Phi,!osophy,
Chemistry, and the Arts 27 (1810), 193-205.
Samuel Beatty, “Problem 3177,” American Mathematical Monthly 34
(1927), 159-160.
E. T. Bell, “Euler algebra,” Transactions of the American Mathematical
Society 25 (1923), 135-154.
E. T. Bell, “Exponential numbers,” American Mathematical Monthly 41
(1934), 411-41’9.
Edward A.
Be:nder,
“Asymptotic methods in enumeration,” SIAM Re-
view 16
(197411,
485-515.
Jacobi Bernoulli, Ars Conjectandi, opus posthumum.
Base&
1713. Re-
printed in Die Werke von Jakob Bernoulli, volume 3, 107-286.
J. Bertrand, ‘%Iemoire sur le nombre de valeurs que peut prendre une
fonction quand on y permute les lettres qu’elle renferme,” Journal de
I’&ole
Royale
Polytechnique 18, cahier 30 (1845), 123-140.
William H. Beyer, editor, CRC Standard Mathematical Tables, 25th edi-
tion. CRC Press, Boca Raton, Florida, 1978.
580 BIBLIOGRAPHY
24'
25
26
27
28
29
30
31
32
33
34
35
36
37
38
J. Bienayme, “Considerations a l’appui de la
decouverte
de
Laplace
sur
la
loi
de probabilite
da.ns
la methode des moindres car&,” Comptes
Rendus hebdomadaires des seances de
1’AcadCmie
des Sciences (Paris)
37
(1853),
309-324.
J. Binet,
“Memoire
sur l’integration des equations lineaires aux diffe-
rences finies, d’un ordre quelconque, a coefficients variables,” Comptes
Rendus hebdomadaires des seances de
1’Academie
des Sciences (Paris)
17
(1843),
559-567.
Gunnar Blom, “Problem. E 3043: Random walk until no shoes,” American
Mathematical Monthly 94
(1987),
78-79.
R. P. Boas, Jr. and J. W. Wrench, Jr., “Partial sums of the harmonic
series,” American Mathematical Monthly 78
(1971),
864-870.
P. Bohl, “Uber ein in der Theorie der s;ikularen
Storungen
vorkom-
mendes Problem,” Journal fiir die reine und angewandte Mathematik
135
(1909), 189-283.
P. du Bois-Reymond, “Sur la grandeur relative des infinis des
fonctions,”
Annali di Matematica pura ed applicata, series 2, 4
(1871),
338-353.
Bmile Borel,
LeGons
sur
les skies
a
termes positifs. Gauthier-Villars,
1902.
Jonathan M. Borwein and Peter B. Borwein, Pi and the AGM. Wiley,
1987.
Richard P. Brent, “The first occurrence of large gaps between successive
primes,”
Mathematics of Computation 27
(1973),
959-963.
Richard P. Brent, “Computation of the regular continued fraction for
Euler’s constant,” Mathematics of Computation 31
(1977),
771-777.
John Brillhart , “Some miscellaneous
factorizations,”
Mathematics of
Computation 17
(1963),
447-450.
Achille Brocot,
“Calcul
des rouages par approximation, nouvelle
me-
thode,” Revue Chronometrique 6
(1860),
186-194. (He also published
a 97-page monograph with the same title in 1862.)
Maxey Brooke and C. R. Wall, “Problem B-14: A little surprise,” Fi-
bonacci Quarterly 1,3
(1963),
80.
Brother U. Alfred [Brousseau], “A mathematician’s progress,” Mathe-
matics Teacher 59
(1966),
722-727.
Morton Brown, “Problem 6439: A periodic sequence,” American Math-
ematical Monthly 92
(1985),
218.
285.
605.
574, 605.
87.
426.
605.
604.
510.
292, 540.
602.
116.
604.
602.
487.
602.
344.
604.
577,
605.
31.
278.
203.
602.
602.
48
510.
602.
376.
39
40
41
42
43
44
45
46
47
49
50
50’
B BIBLIOGRAPHY 581
T. Brown, “Infinite multi-variable subpolynormal Woffles which do not
satisfy the lower regular Q-property (Piffles),” in A Collection of 250 Pa-
pers on Woffle Theory Dedicated to R. S. Green on His 23rd Birthday.
Cited in A. K. Austin, “Modern research in mathematics,” The Mathe-
matical Gazette 51 (1967), 149-150.
Thomas C. Brown, “Problem E 2619: Squares in a recursive sequence,”
American Mathematical Monthly 85
(1978),
52-53.
William G. Brown, “Historical note on a recurrent combinatorial prob-
lem,” American Mathematical Monthly 72
(1965),
973-977.
S. A. Burr, “On moduli for which the Fibonacci sequence contains a
complete system of residues,” Fibonacci Quarterly 9 (1971), 497-504.
E. Rodney Canfield, “On the location of the maximum Stirling num-
ber(s) of the second kind,” Studies in Applied Mathematics 59 (1978)‘
83-93.
Lewis Carroll
‘ipseudonym
of C. L. Dodgson], Through the Looking Glass
and What Alice Found There. Macmillan, 1871.
Jean-Dominique Cassini, “Une nouvelle progression de nombres,”
His-
toire
de 1
‘Acadkmie
Royale
des Sciences, Paris, volume 1, 201. (Cassini’s
work is summarized here as one of the mathematical results presented
to the academy in 1680. This volume was published in 1733.)
E. Catalan, “Note sur une aquation aux differences finies,” Journal de
Mathe’matiques pures et appliquCes 3 (1838), 508-516.
Augustin-Louis
Cauchy,
Cours
d’analyse
de I’ficole RoyaIe Polytech-
nique. Imprimerie Royale, Paris, 1821. Reprinted in his
CEuvres
Com-
PI&es,
series 2, volume 3.
Arnold Buffum Chace, The Rhind Mathematical Papyrus, volume 1.
Mathematical Association of America, 1927. (Includes an excellent bib-
liography of Egyptian mathematics by R. C. Archibald.)
M. Chaimovich, G. Freiman, and J. SchGnheim, “On exceptions to
Szegedy’s theorem,” Acta Arithmetica 49 (1987), 107-112.
P. L. Tchebichef [Chebyshev], “Mkmoire sur les nombres premiers,” Jour-
nal de Mathtimatiques pures et
applique’es
17 (1852), 366-390. Reprinted
in his muvres, volume 1, 51-70.
P. L. Chebyshev, “0 srednikh velichinakh,” Matematicheskii Sbornik’ 2
(1867), l-9. Reprinted in his Polnoe Sobranie Sochinenii, volume 2, 431-
437. French translation, “Des valeurs moyennes,” Journal de
MathCma-
tiques pures et
appliqubes,
series 2, 12 (1867), 177-184; reprinted in his
(Euvres, volume 1, 685-694.
582 BIBLIOGRAPHY
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Th. Clausen, “Ueber die Fglle, wenn die Reihe von der Form
x2 + etc.
603.
ein Quadrat von der Form
z
=
,
+~.!ex+
cx'.OL'+1
f3(f3'+1
6'.6'+1
2
~.~'
,
y'
E'
1.2
y'y'+l
c'.e'+l
x +
etc.
hat,”
Journal fiir die reine und angewandte Mathematik 3 (1828), 89-91.
Th. Clausen, “Beitrag zur Theorie der Reihen,” Journal fiir die reine
und angewandte Mathematik 3 (X328), 92-95.
Th. Clausen, “Theorem,” Astronomische Nachrichten 17 (1840), col-
umns 351-352.
Stuart Dodgson Collingwood, The Lewis Carroll Picture Book. T. Fisher
Unwin, 1899. Reprinted by Dover, 1961, with the new title Diversions
and Digressions of Lewis Carroll.
J. H. Conway and R.L. Graham, “Problem E2567: A periodic recur-
rence,” American Mathematical Monthly 84 (1977), 570-571.
Harald Cram&, “On the order of magnitude of the difference between
consecutive prime numbers,” Acta Arithmetica 2 (1937), 23-46.
A. L. Crelle, “DCmonstration ClCmentaire du
thCor&me
de Wilson
gCnC-
ralisC,”
Journal fiir die reine und angewandte Mathematik 20 (1840),
29-56.
D. W. Crowe, “The n-dimensional cube and the Tower of Hanoi,” Amer-
ican Mathematical Monthly 63 (1956), 29-30.
D. R. Curt&, “On Kellogg’s Diophantine problem,” American Mathe-
matical Monthly 29 (1922), 380-387.
F. N. David and D. E. Barton, Combinatorial Chance. Hafner, 1962.
J. L. Davison, “A series and its associated continued fraction,” Proceed-
ings of the American Mathematical Society 63 (1977), 29-32.
N. G. de Bruijn, Asymptotic Methods in Analysis. North-Holland, 1958;
third edition, 1970. Reprinted by Dover, 1981.
N. G. de Bruijn, “Problem 9,” Nieuw Archief voor Wiskunde, series 3,
12
(1964),
68.
Abraham de Moivre, Miscellanea analytica de seriebus et quadraturis.
London, 1730.
603.
604.
279.
487.
510, 603.
602.
602.
603.
577.
293, 604.
433, 605.
604.
283.
496.
65
604.
66
602.
67
603.
68
171
69
227, 391.
70
162. 71
602.
72
6.
155.
202.
73
74
75
76
603.
500, 510, 603, 604,
77
605.
B BIBLIOGRAPHY 583
Leonard Eugene Dickson, History of the Theory of Numbers. Carnegie
Institution of Washington, volume 1, 1919; volume 2, 1920; volume 3,
1923. Reprinted by Stechert, 1934, and by Chelsea, 1952, 1971.
Edsger W. Dijkstra, Selected Writings on Computing: A Personal Per-
spective. Springer-Verlag, 1982.
G. Lejeune Dirichlet, “Verallgemeinerung eines Satzes
aus
der Lehre
von den Kettenbriichen nebst einigen Anwendungen auf die Theorie
der Zahlen,” Bericht iiber die Verhandlungen der Koniglich-Preufljschen
Akademie der
Wjssenschaften
zu Berlin (1842), 93-95. Reprinted in his
Werke, volume 1, 635-638.
A. C. Dixon, “On the sum of the cubes of the coefficients in a certain ex-
pansion by the binomial theorem,” Messenger of Mathematics 20 (1891),
79-80.
John Dougall, “On Vandermonde’s theorem, and some more general
expansions,” Proceedings of the Edinburgh Mathematical Society 25
(1907), 114-132.
A. Conan Doyle, “The sign of the four; or, The problem of the Sholtos,”
Lippincott’s Monthly Magazme (Philadelphia) 45 (1890), 147-223.
A. Conan Doyle, “The adventure of the final problem,” The Strand Mag-
azine 6 (1893), 558-570.
Henry Ernest Dudeney, The Canterbury Puzzles and Other Curious
Problems. E. P. Dutton, New York, 1908; 4th edition, Dover, 1958. (Du-
deney had first considered the generalized Tower of Hanoi in The Weekly
Dispatch, on 25 May 1902 and 15 March 1903.)
G. Waldo Durmington, Carl Friedrich Gauss: Titan of Science. Exposi-
tion Press, New York, 1955.
A. W. F. Edwards, Pascal’s Arithmetical ‘Triangle. Oxford University
Press, 1987.
G. Eisenstein, “Entwicklung von
o(“~
,”
Journal fiir die reine und
ange-
wandte Mathematik 28 (1844), 49-52. Reprinted in his Mathematische
Werke 1, 122-125.
Erdos Pal, “AZ
k
+
i
+
...
1
+
x,
=
z
egyenlet
egesz
szamti
meg-
oldasairol,” Matematikai Lapok 1 (1950), 192-209. English abstract on
page 210.
P. Erdijs and
R.L.
Graham, Old and New Problems and Results in
Combinatorial .Number Theory. Universite de Geneve, L’Enseignement
Mathematique, 1980.
584 BIBLIOGRAPHY
78
79
80
81
82
83
84
85
85'
86
87
88
P.
Erdbs,
R. L. Graham, I. Z. Ruzsa, and E.G. Straus, “On the prime
factors of
(:)
,‘I
Mathematics of Computation 29
(1975),
83-92.
Arulappah Eswarathasan and Eugene Levine, “p-integral harmonic
sums,” Discrete Mathematics, to appear.
Euclid, CTOIXEIA. Ancient manuscript first printed in Basel, 1533.
Scholarly edition (Greek and Latin) by J. L. Heiberg in five volumes,
Teubner, Leipzig, 1883-1888.
Leonhard Euler, letter to Christian
Goldbach
(13 October
1729),
in
Cor-
respondance
Mathkmatique et Physique de Quelques C&bres
GCom&res
du
XVIII6me
SiWe,
edited by P. H. Fuss, St. Petersburg, 1843, volume 1,
3-7.
Leonhard Euler, “Methodus generalis summandi progressiones,”
Com-
mentarii academiz scientiarum Petropolitana 6
(1732),
68-97. Re-
printed in his Opera Omnia, series 1, volume 14, 42-72.
Leonhard Euler, “De progressionibus harmonicis observationes,”
Com-
mentarii academia? scientiarum Petropolitanae 7
(1734),
150-161. Re-
printed in his Opera Omnia, series 1, volume 14, 87-100.
Leonhard Euler, “De fractionibus
continuis,
Dissertatio,” Commentarii
academia: scientiarum Petropolitana: 9
(1737),
98-137. Reprinted in his
Opera Omnia, series 1, volume 14, 187-215.
Leonhard Euler,
“Varia?
observationes circa series infinitas,”
Commen-
tarii academiz scientiarum Petropolitana: 9
(1737),
160-188. Reprinted
in his Opera Omnia, series 1, volume 14, 216-244.
Leonhard Euler, letter to Christian
Goldbach
(4 July
1744),
in
Corre-
spondance Mathkmatique et Physique de Quelques C&bres
GCom&res
du
XVIIlhme
Si&cle,
edited by P. H. Fuss, St. Petersburg, 1843, volume 1,
278-293.
Leonhard Euler, Introductio in Analysin Infinitorum.
Tomus
primus,
Lausanne, 1748. Reprinted in his Opera Omnia, series 1, volume 8. Trans-
lated into French, 1786; German, 1788.
Leonhard Euler, “De pactitione numerorum,”
Novi
commentarii academ-
ia? scientiarum Petropolitana: 3
(1750),
125-169. Reprinted in his
Com-
mentationes arithmetic=
collect=,
volume 1, 73-101. Reprinted in his
Opera Omnia, series 1, volume 2, 254-294.
Leonhard Euler, Institutiones Calculi Differentialis cum eius usu in
An-
alysi Finitorum ac Doctrina Serierum. Petrograd,
Academiae
Imperialis
Scientiarum, 1755. Reprinted in his Opera Omnia, series 1, volume 10.
Translated into German, 1790.
511, 526.
604.
108.
210, 603.
455.
264.
122.
602.
577.
604.
604.
48, 253, 577.
133, 134.
89
289.
90
285, 605.
6, 604.
131.
499.
207, 603.
367, 605.
96
131.
97
602, 603.
98
604.
99
91
92
93
94
95
B BIBLIOGRAPHY 585
Leonhard Euler, “Theoremata arithmetica nova method0 demonstrata,”
Novi commentarii academia: scientiarum Petropolitanze 8 (1760), 74-
104. (Also presented in 1758 to the Berlin Academy.) Reprinted in his
Commentatione,s arithmetic=
collecta?,
volume 1, 274-286. Reprinted in
his Opera Omnja, series 1, volume 2, 531-555.
Leonhard Euler, “Specimen algorithmi singularis,” Novi commentarii
academia? scientiarum Petropolitanae 9 (1762), 53-69. (Also presented
in 1757 to the Berlin Academy.) Reprinted in his Opera Omnia, series 1,
volume 15, 31-49.
Leonhard Euler, “Observationes analyticae,” Novi commentarii academire
scientiarum Pet.ropolitanae 11 (1765), 124-143. Reprinted in his Opera
Omnia, series 1, volume 15, 50-69.
Leonhard Euler; Vollsttidige Anleitung
BUT
Algebra. Erster Theil. Von
den verschiedenen Rechnungs-Arten, Verhsltnissen und Proportionen.
St. Petersburg,
:1770.
Reprinted in his Opera Omnia, series 1, volume 1.
Translated into Russian, 1768; Dutch, 1773; French, 1774; Latin, 1790;
English, 1797.
Leonhard Euler, “Observationes circa bina biquadrata quorum summam
in duo alia biquadrata resolvere liceat,” Novi commentarii academia:
sci-
entiarum Petropolitana 17 (1772), 64-69. Reprinted in his Opera Om-
nia, series 1, volume 3, 211-217.
Leonhard Euler, “Observationes circa novum et singulare
progres-
sionum genus,” .Novi commentarii academia scientiarum Petropolitanze
20 (1775), 123-:139. Reprinted in his Opera Omnia, series 1, volume 7,
246-261.
Leonhard Euler, “Specimen transformationis singularis serierum,” Nova
acta academia scientiarum Petropolitana: 12 (1794), 58-70. Submitted
for publication in 1778. Reprinted in his Opera Omnia, series 1, vol-
ume 16(2), 41-55.
William Feller, .An Introduction to Probability Theory and Its Applica-
tions, volume 1. Wiley, 1950; second edition, 1957; third edition, 1968.
Pierre de Ferm,at, letter to Marin Mersenne (25 December 1640), in
CIuvres de Fermat, volume 2, 212-217.
Leonardo Fibonacci [Pisano], Liber Abaci. First edition, 1202 (now lost);
second edition 1228. Reprinted in Scritti di Leonardo Pisano, edited by
Baldassarre Boncompagni, 1857, volume 1.
Michael E. Fisher, “Statistical mechanics of dimers on a plane lattice,”
Physical Review 124 (1961), 1664-1672.
586 BIBLIOGRAPHY
100 R. A. Fisher,
“Moments and product moments of sampling distribu- 605
tions,” Proceedings of the London Mathematical Society, series 2, 30
(1929), 199-238.
101 Pierre Forcadel, L’arithmeticque. Paris, 1557.
603.
102 J. Fourier, “Refroidissement sCculaire du globe terrestre,” Bulletin des
22
Sciences par la
Socie’tC
philomathique de Paris, series 3, 7 (1820), 58-70.
Reprinted in
GYuvres
de Fourier, volume 2, 271-288.
103 Aviezri S. Fraenkel, “Complementing and exactly covering sequences,”
500, 602.
Journal of Combinatorial Theory, series A, 14 (1973), 8-20.
104 Aviezri S. Fraenkel, “How to beat your Wythoff games’ opponent on
538.
three fronts,” American Mathematical Monthly 89 (1982), 353-361.
105 J. S. Frame, B. M. Stewart, and Otto Dunkel, “Partial solution to prob-
602.
lem 3918,” American Mathematical Monthly 48 (1941), 216-219.
106 Piero della Francesca, Libellus de quinque corporibus regularibus. Vat-
604.
ican Library, manuscript Urbinas 632. Translated into Italian by Luca
Pacioli, as part 3 of Pacioli’s Diuine Proportione, Venice, 1509.
107 W. D. Frazer and A. C. McKellar, “Samplesort: A sampling approach to
603.
minimal storage tree sorting,” Journal of the ACM 27 (1970), 496-507.
108 Michael Lawrence Fredman, Growth Properties of a Class of Recursively
499.
Defined Functions. Ph.D. thesis, Stanford University, Computer Science
Department, 1972.
109 Nikolaus I. Fuss, “Solutio quEstionis, quot modis polygonum n lat- 347
erum in polygona m laterum, per diagonales resolvi quzat,” Nova acta
academia: scientiarum Petropolitana: 9 (1791), 243-251.
110 Martin Gardner, “About phi, an irrational number that has some re-
285.
markable geometrical expressions,” Scientific American
201,2
(August
1959), 128-134. Reprinted with additions in his book The 2nd Scientific
American Book of Mathematical Puzzles
&
Diversions, 1961, 89-103.
111 Martin Gardner, “On the paradoxical situations that arise from nontran-
396.
sitive relations,” Scientific American
231,4
(October 1974), 120-124. Re-
printed with additions in his book Time Travel and Other Mathematical
Bewilderments, 1988, 55-69.
112 Martin Gardner, “From rubber ropes to rolling cubes, a miscellany of
603.
refreshing problems,” Scientific American
232,3
(March 1975), 112-114;
232,4
(April 1975), 130, 133. Reprinted with additions in his book Time
Travel and Other Mathematical Bewilderments, 1988, 111-124.
113 Martin Gardner, “On checker jumping, the amazon game, weird dice,
605.
card tricks and other playful pastimes,” Scientific American
238,2
(February 1978), 19, 22, 24, 25, 30, 32.
B BIBLIOGRAPHY 587
605.
114 J. Garfunkel,
“F)roblem
E 1816: An inequality related to Stirling’s for-
mula,” American Mathematical Monthly 74 (1967), 202.
123, 602.
115
C. F. Gauss, Disquisitiones Arithmetic=. Leipzig, 1801. Reprinted in his
Werke, volume 1.
207, 222, 514, 603.
116
Carol0 Friderico Gauss, “Disquisitiones generales circa seriem infinitam
,
j
a8
x--
4~+l)BiB+li
1
.-Y
1 .2.
y(y+
1) xx
da
+
1)(~+216(l3
+
liiB+2)x3
+etc,
+.I .2.3.y(y+l)(y-t2)
528.
256.
257.
577, 602.
493.
446.
603.
224, 603.
498.
604.
603.
Pars prior,” Commentationes societatis
regiz
scientiarum Gottingensis
recentiores 2 (1813). (Thesis delivered to the Royal Society in Gijttingen,
20 January 1812.) Reprinted in his Werke, volume 3, 123-163, together
with an unpublished sequel on pages 207-229.
117 A. Genocchi, “Intorno all’ expressioni generali di numeri Bernoulliani,”
Annali di Scienze Matematiche e Fisiche 3 (1852), 395-405.
118 Ira Gessel and
YRichard
P. Stanley, “Stirling polynomials,” Journal of
Combinatorial Theory, series A, 24 (1978), 24-33.
119 Jekuthiel Ginsburg, “Note on Stirling’s numbers,” American Mathemat-
ical Monthly 35 (1928), 77-80.
120 Solomon W. Golomb, “Problem 5407: A nondecreasing indicator func-
tion,” American Mathematical Monthly 74 (1967), 740-743.
121 Solomon W. Golomb, “The ‘Sales Tax’ theorem,” Mathematics Magazine
49 (1976), 187-:189.
122 Solomon W. Golomb, “Problem E2529: An application of Q(x),” Amer-
ican Mathematical Monthly 83 (1976), 487-488.
123 I. J. Good, “Short proof of a conjecture by Dyson,” Journal of Mathe-
matical Physics
11
(1970),
1884.
124 R. William Gosper, Jr., “Decision procedure for indefinite
hypergeo-
metric summati’on,” Proceedings of the National Academy of Sciences
of the United States of America 75 (1978), 40-42.
125 R. L. Graham, “On a theorem of Uspensky,” American Mathematical
Monthly 70 (1963), 407-409.
126 R. L. Graham,
“,4
Fibonacci-like sequence of composite numbers,” Math-
ematics Magazine 37 (1964), 322-324.
127 R. L. Graham, “Problem 5749,”
American Mathematical Monthly 77
(1970), 775.
588 BIBLIOGRAPHY
128 Ronald L. Graham, “Covering the positive integers by disjoint sets of
5~x1.
theform{[nol+fi]:n=1,2,...
},”
Journal of Combinatorial Theory,
series A, 15 (1973), 354-358.
129 R. L. Graham, “Problem 1242: Bijection between integers and
compos-
602.
ites,” Mathematics Magazine 60 (1987), 180.
130 R. L. Graham and D. E. Knuth, “Problem E 2982: A double infinite sum
602.
for
1x1,”
American Mathematical Monthly 96 (1989), 525-526.
131 Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete
102.
Mathematics: A Foundation for Computer Science. Addison-Wesley,
1989. (The first printing had a different Iversonian notation.)
132 R. L. Graham and H. 0. Pollak, “Note on a nonlinear recurrence related
602.
to
fi,I’
Mathematics Magazine 43 (1970), 143-145.
133 Guido Grandi, letter to Leibniz (July 1713), in Leibnizens mathematische
58.
Schriften, volume 4, 215-217.
134 Daniel H. Greene and Donald E. Knuth, Mathematics for the Analysis
520, 605.
of Algorithms. Birkhguser, Boston, 1981; third edition, 1990.
135 Samuel L. Greitzer, International Mathematical Olympiads, 1959-1977.
602.
Mathematical Association of America, 1978.
136 Oliver A. Gross, “Preferential arrangements,” American Mathematical
604.
Monthly 69 (1962), 4-i3.
137 Branko Griinbaum,
“Venn
diagrams and independent families of sets,”
484.
Mathematics Magazine 48 (1975), 12-23.
138 L. J. Guibas and A. M. Odlyzko, “String overlaps, pattern matching, and
565,
605
nontransitive games,” Journal of Combinatorial Theory, series A, 30
(1981), 183-208.
139 Richard K. Guy, Unsolved Problems in Number Theory. Springer-
510.
Verlag, 1981.
140 Marshall Hall, Jr., The Theory of Groups. Macmillan, 1959.
530.
141 P.R. Halmos, “How to write mathematics,” L’Enseignement matht?ma-
vi.
tique 16 (1970), 123-152. Reprinted in How to Write Mathematics,
American Mathematical Society, 1973, 19-48.
142 Paul R. Halmos,
I
Want to Be a Mathematician: An Automathography.
v.
Springer-Verlag, 1985. Reprinted by Mathematical Association of Amer-
ica, 1988.
143 G. H. Halphen, “Sur
des,
suites de fractions analogues
B
la suite de
Farey,”
291.
Bulletin de la
SociCtC
mathkmatique de France 5 (1876), 170-175. Re-
printed in his
C%vres,
volume 2, 102-107.
566.
V.
604.
42.
428, 605.
605.
111, 602.
286, 318, 576, 605.
603.
532.
603.
524, 603.
8.
603.
28.
603.
B BIBLIOGRAPHY 589
144 Hans Hamburger,
“Uber
eine Erweiterung des Stieltjesschen Momenten-
problems,” Mathematische Annalen 81
(1920),
235-319; 82 (1921), 120-
164, 168-187.
145 J. M. Hammersley, “On the enfeeblement of mathematical skills by ‘Mod-
ern Mathematics’ and by similar soft intellectual trash in schools and
universities,” Bulletin of the Institute of Mathematics and its Applica-
tions
4,4
(October 1968), 66-85.
146 J. M. Hammersley, “An undergraduate exercise in manipulation,” The
Mathematical Scientist 14 (1989), l-23.
147 Eldon R. Hansen, A Table of Series and Products. Prentice-Hall, 1975.
148 G. H. Hardy, Orders of Infinity: The ‘Infinitticalciil’ of Paul du
Bois-
Reymond. Cambridge University Press, 1910; second edition, 1924.
149 G. H. Hardy, “A. mathematical theorem about golf,” The Mathematical
Gazette 29 (1944), 226-227. Reprinted in his Collected Papers, volume 7,
488.
150 G. H. Hardy and E. M. Wright, An Introduction to the Theory of Num-
bers. Clarendon Press, Oxford, 1938; fifth edition, 1979.
151 Peter Henrici, Applied and Computational Complex Analysis. Wiley,
volume 1, 1974; volume 2, 1977; volume 3, 1986.
152 Peter Henrici, “De Branges’ proof of the Bieberbach conjecture: A view
from computational analysis,” Sitzungsberichte der Berliner Mathema-
tischen Gesellschti (1987), 105-121.
153 Charles Hermite, letter to C. W. Borchardt (8 September 1875), in Jour-
nal fiir die reine und angewandte Mathematik 81
(1876),
93-95. Re-
printed in his
QZuvres,
volume 3, 211-214.
154 Charles Hermite, Cours de M. Hermite.
FacultC
des Sciences de Paris,
1882. Third edition, 1887; fourth edition, 1891.
155 Charles Hermite, letter to S. Pincherle (10 May 1900), in Annali
di
Matematica pura ed applicata, series 3, 5 (1901), 57-60. Reprinted in
his CGvres, volume 4, 529-531.
156 I.N. Herstein and I. Kaplansky, Matters Mathematical. Harper
&
Row,
1974.
157 A. P. Hillman and V. E. Hoggatt, Jr., “A proof of Gould’s Pascal hexagon
conjecture,” Fibonacci Quarterly 10
(1972),
565-568, 598.
158 C. A. R. Hoare, “Quicksort,” The Computer Journal 5 (1962), 10-15.
159 L. C. Hsu, “Note on a combinatorial algebraic identity and its applica-
tion,” Fib0nacc.i Quarterly 11
(1973),
480-484.
590 BIBLIOGRAPHY
160 K. Inkeri,
“Absch;itzu:ngen
fiir eventuelle Lijsungen der Gleichung im
509.
Fermatschen Problem,”
Annales
Universitatis
Turkuensis,
series A, 16,
1
(1953),
3-9.
161 Kenneth E. Iverson, A Programming Language. Wiley, 1962.
24,
67,
602.
162 C. G. J. Jacobi, Fundamenta nova theor&
functionurn
ellipticarum.
64.
K6nigsberg,
BorntrBger,
1829. Reprinted in his Gesammelte Werke, vol-
ume 1, 49-239.
163 Dov Jarden and Theodor Motzkin, “The product of sequences with a
533.
common linear recursion formula of order 2,”
Riveon
Lematematika 3
(1949),
25-27, 38
(Heb’rew
with English summary). English version re-
printed in Dov
Jarden.,
Recurring Sequences, Jerusalem, 1958, 42-45;
second edition, Jerusalem, 1966, 30-33.
164 Arne Jonassen and Donald E. Knuth, “A trivial algorithm whose analysis
520.
isn’t,” Journal of Computer and System Sciences 16
(1978),
301-322.
165 Bush Jones, “Note on internal merging,” Software -Practice and Expe-
175.
rience 2
(1972),
241-243.
166 Flavius Josephus, ETOPIA IOTAAIKOT
ITOAEMOT
ITPOC
Pa-
8.
MAIOTC. English translation, History of the Jewish War against the
Remans,
by H. St. J. Thackeray, in the Loeb Classical Library edition
of Josephus’s works, volumes 2 and 3, Heinemann, London, 1927-1928.
(The “Josephus problem” may be based on an early manuscript now pre-
served only in the Slavonic version; see volume 2, page xi, and volume 3,
page 654.)
167 R.
Jungen,
“Sur les
series
de Taylor n’ayant que des singularit&
604.
algCbrico-logarithmiques
sur
leur
cercle
de convergence,” Commentarii
Mathematici
Helvetici
3
(1931),
266-306.
168 I.
Kauckp,
“Problem
132257:
A harmonic identity,” American Mathe-
604.
matical Monthly 78
(1971),
908.
169 Murray S. Klamkin, Initernational Mathematical Olympiads, 1978-1985,
602,
603.
and Forty Supplementary Problems. Mathematical Association of Amer-
ica, 1986.
170 Konrad Knopp, Theorie und Anwendung der unendlichen Reihen. Julius
605.
Springer, Berlin, 1922; second edition, 1924. Reprinted by Dover, 1945.
Fourth edition, 1947; fifth edition, 1964. English translation, Theory and
Application of Infinite Series, 1928; second edition, 1951.
171 Donald E. Knuth, “Euler’s constant to 1271 places,” Mathematics of
467.
Computation 16
(1962),
275-281.
172 Donald Knuth, “Transcendental numbers based on the Fibonacci se-
531.
quence,”
Fibonacci Quarterly 2
(1964),
43-44, 52.
vi,
486, 499, 515,
548, 602, 603, 604,
605.
110, 128, 486, 602,
604, 605.
253, 397, 487, 603,
604, 605.
603.
605.
602.
540.
602.
602.
603.
604.
538.
532.
111.
213, 603.
B BIBLIOGRAPHY 591
173 Donald E. Knuth, The Art of Computer Programming, volume 1: Fun-
damental
Algor.ithms.
Addison-Wesley, 1968; second edition, 1973.
174 Donald E. Knuth, The Art of Computer Programming, volume 2:
Seminumerical Algorithms. Addison-Wesley, 1969; second edition, 1981.
175 Donald E. Knuth, The Art of Computer Programming, volume 3: Sorting
and Searching. Addison-Wesley, 1973; second printing, 1975.
176 Donald E. Knuth, “Problem E 2492: Some sum,” American Mathematical
Monthly 82 (1975), 855.
177 Donald E. Knuth, Mariages stables et leurs relations
avec
d’autres
probl&mes combinatoires. Les Presses de
1’UniversitC
de Mont&al, 1976.
Revised and corrected edition, 1980.
178 Donald E. Knuth, The
wbook.
Addison-Wesley, 1984. Reprinted as
volume A of Computers
&
Typesetting, 1986.
179 Donald E. Knuth, “An analysis of optimum caching,” Journal of Algo-
rithms 6 (1985), 181-199.
180 Donald E. Knuth, Computers
&
Typesetting, volume D: METRFONT:
The Program. Addison-Wesley, 1986.
181 Donald E. Knuth, “Problem 1280,” Mathematics Magazine 60 (1987),
329.
182 Donald E. Knuth, “Problem E 3106: A new sum for n’,” American Math-
ematical Monthly 94 (1987), 795-797.
183 Donald E. Knuth, “Fibonacci multiplication,” Applied Mathematics Let-
ters 1 (1988), 57-60.
184 Donald E. Knuth, “A Fibonacci-like sequence of composite numbers,”
Mathematics Magazine 63 (1990), 21-25.
185 Donald E. Knufh and Thomas J. Buckholtz, “Computation of Tangent,
Euler, and Bernoulli numbers,” Mathematics of Computation 21 (1967),
663-688.
186 C. Kramp, lhmens d’arithmgtique universelle. Cologne, 1808.
187 E. E. Kummer, “Ueber die hypergeometrische Reihe
, I
4
x+m+wB+li
1
.Y
1.2.y(y+l)
xx
+a(a+l)(a+2)8(8+1)(B+2)X3+
1.2.3.y(y+l)(y+2)
"'
In
Journal fiir die reine und angewandte Mathematik 15 (1836), 39-83,
127-172. Reprinted in his Collected Papers, volume 2, 75-166.
592 BIBLIOGRAPHY
188 E.E. Kummer,
“Uber
die Erganzungssatze zu den allgemeinen Re-
603.
ciprocitatsgesetzen,” Journal fiir die reine und angewandte Mathematik
44 (1852), 93-146. Reprinted in his Collected Papers, volume 1, 485-538.
189 R. P. Kurshan and B. Gopinath,
“Recursively generated periodic se- 487.
quences,”
Canadian Journal of Mathematics 26 (1974), 1356-1371.
190 Thomas Fantet de Lagny, Analyse g&&ale ou Methodes nouvelles pour
290.
resoudre les problemes de
tous
les genres et de
tous
les degr6
2
l’infini.
Published as volume 11. of Memoires de
1’AcadCmie
Royale des Sciences,
Paris, 1733.
191 J.-L. de la Grange [Lagrange],
“Demonstration d’un theoreme nouveau 604.
concernant les nombreis premiers,” Nouveaux Memoires de
1’AcadCmie
royale des Sciences et Belles-Lettres de Berlin (1771), 125-137. Reprinted
in his auvres, volume 3, 425-438.
192 J.-L. de la Grange [Lagrange], “Sur une nouvelle
espece
de calcul relatif
456.
a
la differentiation
&
a l’integration des quantites variables,” Nouveaux
Memoires de
1’AcadCmie
royale des Sciences et Belles-Lettres de Berlin
(1772), 185-221. Reprinted in his Ckvres, volume 3, 441-476.
193 I. Lah, “Eine neue Art von Zahlen, ihre Eigenschaften und Anwendung
603.
in der mathematischen. Statistik,” Mitteilungsblatt fiir Mathematische
Statistik 7 (1955), 2033212.
194 Edmund Landau, Handbuch der Lehre von der Verteilung der
Prim-
434, 605.
zahlen, two volumes. Teubner, Leipzig, 1909.
195 Edmund Landau, Vorlesungen iiber Zahlentheorie, three volumes. Hirzel,
603.
Leipzig, 1927.
195’ P. S. de la Place [Laplace], “Memoire sur les approximations des Formules
452.
qui sont fonctions de t&-grands nombres,” Memoires de 1’Academie
royale des Sciences de Paris (1782), l-88. Reprinted in his
CEuvres
Completes 10, 207-291.
196 Adrien-Marie Legendre, Essai sur la
The’orie
des Nombres. Paris, 1798;
602.
second edition, 1808. Third edition (retitled
The’orie
des Nombres, in two
volumes), 1830; fourth edition, Blanchard, 1955.
197 D. H. Lehmer, “Tests for primality by the converse of Fermat’s theorem,”
602.
Bulletin of the American Mathematical Society, series 2, 33 (1927), 327-
340.
198 D. H. Lehmer, “On Stern’s diatomic series,” American Mathematical 604.
Monthly 36 (1929), 59-67.
199 D. H. Lehmer, “On Euler’s totient function,” Bulletin of the American
511.
Mathematical Society,
#series
2, 38 (1932), 745-751.
B BIBLIOGRAPHY 593
168.
281.
605.
602.
604.
603.
536.
602, 603, 604.
603.
278, 603.
1.
602.
487.
455.
140.
280, 604.
200 G. W. Leibniz, letter to Johann Bernoulli (May
1695),
in Leibnizens
mathematische Schriften, volume 3, 174-179.
201 C. G. Lekkerkerker, “Voorstelling van natuurlijke getallen door een som
van getallen van Fibonacci,” Simon Stevin 29
(1952),
190-195.
201’Elliott H. Lieb, “Residual entropy of square ice,” Physical Review 162
(1967),
162-172.
202 B. F. Logan, “The recovery of orthogonal polynomials from a sum of
squares,”
SIAM Journal on Mathematical Analysis 21(1990), 1031-1050.
202’ B. F. Logan, “Polynomials related to the Stirling numbers,” AT&T Bell
Laboratories internal technical memorandum, August 10, 1987.
203 Calvin T. Long and Verner E. Hoggatt, Jr.,
“Sets of binomial coefficients
with equal products,” Fibonacci Quarterly 12
(1974),
71-79.
204 Sam Loyd, Cyclopedia of Puzzles. Franklin Bigelow Corporation,
Morn-
ingside Press, New York, 1914.
205 E. Lucas, “Sur les rapports qui existent entre la theorie des nombres
et le
Calcul
integral,” Comptes Rendus hebdomadaires des seances de
I’AcadCmie
des Sciences (Paris) 82
(1876),
1303-1305.
206 Edouard Lucas,
“Sur
les congruences des nombres euleriens et des coef-
ficients differentiels des
fonctions
trigonometriques, suivant un module
premier,”
Bulletin de la
SociCtC
mathematique de France 6
(1878),
49-54.
207 Edouard Lucas,
ThCorie
des
Nombres,
volume 1. Gauthier-Villars, Paris,
1891.
208 Edouard Lucas, Recreations mathematiques, four volumes. Gauthier-
Villars, Paris, 1891-1894. Reprinted by Albert Blanchard, Paris, 1960.
(The Tower of Hanoi is discussed in volume 3, pages 55-59.)
209 R. C. Lyness, ‘Cycles,” The Mathematical Gazette 26
(1942),
62.
210 R. C. Lyness, “Cycles,” The Mathematical Gazette 29
(1945),
231-233.
211 Colin Maclaurin, Collected Letters, edited by Stella Mills.
Shiva
Pub-
lishing, Nantwich, Cheshire, 1982.
212 P. A.
MacMahon,
“Application of a theory of permutations in circular
procession to the theory of numbers,”
Proceedings of the London Math-
ematical Society 23
(1892),
305-313.
213
I^u.
V.
MatiiBsevich,
“Diofantovost’ perechislimykh mnozhestv,” Doklady
Akademii
Nauk
SSSR 191
(1970),
279-282. English translation, with
amendments by the author, “Enumerable sets are diophantine,” Soviet
Mathematics
11
(1970), 354-357.
594 BIBLIOGRAPHY
214 Z.A. Melzak,
Compani’on
to Concrete Mathematics. Volume 1, Math-
vi.
ematical Techniques and Various Applications, Wiley, 1973; volume 2,
Mathematical Ideas, Modeling
&
Applications, Wiley, 1976.
215 N. S. Mendelsohn, “Problem E 2227: Divisors of binomial coefficients,”
603.
American Mathematical Monthly 78 (1971), 201.
216 W. H. Mills, “A prime representing function,” Bulletin of the American
603.
Mathematical Society, series 2, 53 (1947), 604.
217 A. Moessner, “Eine Bemerkung iiber die Potenzen der natiirlichen 604.
Zahlen,” Sitzungsberichte der Mathematisch
-
Naturwissenschaftliche
Klasse der Bayerischen Akademie der Wissenschaften, 1951, Heft 3, 29.
218 Peter L. Montgomery, “Problem E2686: LCM of binomial coefficients,”
603
American Mathematical Monthly 86 (1979), 131.
219 Leo Moser, “Problem
Es-6:
Some reflections,” Fibonacci Quarterly
1,4
~77
(1963), 75-76.
220 T. S. Motzkin and E. G. Straus, “Some combinatorial extremum prob-
539.
lems,” Proceedings of the American Mathematical Society 7 (1956),
1014-1021.
221 C. J. Mozzochi, “On the difference between consecutive primes,” Journal
510.
of Number Theory 24 (1986), 181-187.
222 B. R. Myers, “Problem 5795: The spanning trees of an n-wheel,” Amer-
604.
ican Mathematical Monthly 79 (1972), 914-915.
223 Isaac Newton, letter to John Collins (18 February 1670), in The Corre-
263.
spondence of Isaac Newton, volume 1, 27. Excerpted in The Mathemat-
ical Papers of Isaac Newton, volume 3, 563.
224 Ivan Niven, Diophantine Approximations. Interscience, 1963.
602.
225 Ivan Niven, “Formal power series,” American Mathematical Monthly 76
318.
(1969), 871-889.
226 Blaise Pascal, “De numeris multiplicibus,” presented to AcadCmie Parisi-
602.
enne in 1654 and published with his Trait6 du triangle arithmbtique
[227].
Reprinted in @uvres de Blaise Pascal, volume 3, 314-339.
227 Blaise Pascal,
“TraitC
du triangle arithmetique,” in his
TraitC
du Triangle
155, 156,
594.
Arithmetique,
avec
quelques autres petits traitez sur la mesme matiere,
Paris, 1665. Reprinted in
C!Xuvres
de Blaise Pascal (Hachette, 1904-1914),
volume 3, 445-503; Latin editions from 1654 in volume 11, 366-390.
228 G. P. Patil, “On the evaluation of the negative binomial distribution with
605.
examples,” Technometrics
3
(1960).
501-505.
603.
510.
394.
604.
207.
603.
48.
605.
457.
604.
vi, 16, 494, 602.
313, 604.
605.
487.
604.
514.
B BIBLIOGRAPHY 595
229 C. S. Peirce, letter to E. S. Holden (January 1901). In
‘The
New Elements
of Mathematics, edited by Carolyn Eisele, Mouton, The Hague, 1976,
volume 1, 247-253. (See also page 211.)
230 C. S. Peirce, letter to Henry B. Fine (17 July 1903). In The New Elements
of Mathematics, edited by Carolyn Eisele, Mouton, The Hague, 1976,
volume 3, 781-784. (See also “Ordinals,” an unpublished manuscript
from circa 1905, in Collected Papers of Charles Sanders Peirce, volume 4,
268-280.)
231 Walter Penney, “Problem 95: Penney-Ante,” Journal of Recreational
Mathematics 7 (1974), 321.
232 J. K. Percus, Combinatorial Methods. Springer-Verlag, 1971.
233 J. F. Pfaff, “Observationes analytic= ad L.
Euleri
institutiones calculi
integralis, Vol. IV, Supplem. II
&
IV,” Nova acta academia: scientiarum
Petropolitana: 1.1, Histoire section, 37-57. (This volume, printed in
1798, contains mostly proceedings from 1793, although
PfafF’s
memoir
was actually received in 1797.)
234 L. Pochhammer, “Ueber hypergeometrische Functionen
nter
Ordnung,”
Journal fiir die reine und angewandte Mathematik 71 (1870), 316-352.
235 H. PoincarC, “Sur les fonctions
B
espaces
lacunaires,” American Journal
of Mathematics 14 (1892), 201-221.
236 S. D. Poisson, “MCmoire
sur
le calcul numCrique des intCgrales
dkfinies,”
Mkmoires de
1’AcadCmie
Royale des Sciences de l’lnstitut de
France,
series 2, 6 (1823), 571-602.
237 G. PcYya,“Kombinatorische Anzahlbestimmungen fiir Gruppen, Graphen
und chemische Verbindungen,” Acta Mathematics 68 (1937), 145-254.
238 George Pblya, Induction and Analogy in Mathematics. Princeton Uni-
versity Press, 1954.
239 G. P6lya, “On picture-writing,”
American Mathematical Monthly 63
(1956), 689-697.
240 G. P6lya and G. Szegij, Aufgaben und Lehrsitze
aus
der Analysis, two
volumes. Julius Springer, Berlin, 1925; fourth edition, 1970 and 1971.
English translation, Problems and Theorems in Analysis, 1972 and 1976.
241 Bjorn Poonen, “Josephus sets.” Unpublished manuscript, 1987.
242 R. Rado, “A note on the Bernoullian numbers,” Journal of the London
Mathematical Society 9 (1934), 88-90.
242’ Earl D. Rainville, “The contiguous function relations for
pFq
with appli-
cations to Bateman’s J$” and Rice’s
H,(
<,
p,v),” Bulletin of the Amer-
ican Mathematical Society, series 2, 51 (1945), 714-723.
596 BIBLIOGRAPHY
243 George N. Raney, “Functional composition patterns and power series re-
345, 604.
version,” Transactions of the American Mathematical Society 94 (1960),
441-451.
244 D. Rameswar Rao, “Problem E 2208: A divisibility problem,” American
602.
Mathematical Monthly 78 (1971), 78-79.
245 John William Strutt, Third Baron Rayleigh, The Theory of Sound. First
77.
edition, 1877; second edition, 1894. (The cited material about irrational
spectra is from section 92a of the second edition.)
246 Robert Recorde, The Whetstone of Witte. London, 1557.
432.
247 Simeon Reich, “Problem 6056: Truncated exponential-type series,”
605.
American Mathematical Monthly 84 (1977), 494-495.
248 Georges de Rham, “Un peu de mathkmatiques
B
propos d’une courbe
604.
plane,” Elemente der Mathematik 2 (1947), 73-76, 89-97. Reprinted in
his
U3uvres
Mathbmatiques, 678-689.
249 Paolo Ribenboim, 13 ,Lectures on Fermat’s Last Theorem. Springer-
509, 532, 603
Verlag, 1979.
250 Bernhard Riemann,
“1Jeber
die Darstellbarkeit einer Function durch 602.
eine trigonometrische
R.eihe,”
Habilitationsschrift, G6ttingen, 1854. Pub-
lished in Abhandlungen der mathematischen
Classe
der Kijniglichen
Gesellschaft der Wissenschaften zu Gettingen 13 (1868), 87-132. Re-
printed in his Gesammelte Mathematische Werke, 227-264.
251 Samuel Roberts, “On
t:he
figures formed by the intercepts of a system of
602.
straight lines in a plane, and on analogous relations in space of three di-
mensions,” Proceedings of the London Mathematical Society 19 (1889),
405-422.
252 0ystein
Reidseth,
“Pro’blem E 2273: Telescoping Vandermonde convolu-
603.
tions,” American Mathematical Monthly 79 (1972), 88-89.
253 J. Barkley Rosser and Lowell Schoenfeld, “Approximate formulas for 111.
some functions of prime numbers,” Illinois Journal of Mathematics 6
(1962), 64-94.
254 Gian-Carlo Rota, “On the foundations of combinatorial theory. I. The-
501.
ory of Mijbius functions,” Zeitschrift fiir Wahrscheinlichkeitstheorie und
verwandte Gebiete 2 (:1964), 340-368.
255 Ranjan Roy, “Binomial identities and hypergeometric series,” American
603.
Mathematical Monthly 94 (1987), 36-46.
256 Louis Saalschiitz, “Eine Summationsformel,” Zeitschrift fiir Mathematik
603.
und Physik 35 (1890), 186-188.
B BIBLIOGRAPHY 597
526.
207.
279.
604.
604.
602.
604.
603.
259.
87.
603.
603.
604.
223.
269 Lucy Joan Slater, Generalized Hypergeometric Series. Cambridge Uni-
versity Press, 1966.
42, 327, 450.
270 N. J. A. Sloane, A Handbook of Integer Sequences. Academic Press, 1973.
256’ A.
S&rkGzy,
“On divisors of binomial coefficients, I,” Journal of Number
Theory 20
(1985),
70-80.
257 W. W. Sawyer, Prelude to Mathematics. Baltimore, Penguin, 1955.
258 0.
SchlGmilch,
“Ein geometrisches Paradoxon,” Zeitschrift
fiir
Mathe-
matik und Physik 13
(1868),
162.
259 Ernst
SchrGder,
“Vier combinatorische Probleme,” Zeitschrift
fiir
Mathe-
matik und Physik 15
(1870),
361-376.
260 Heinrich
Schrtiter,
“Ableitung der Partialbruch- und Produkt-Entwicke-
lungen fiir die trigonometrischen Funktionen,” Zeitschrift
fiir
Mathe-
matik und Physik 13
(1868),
254-259.
261
R.
S. Scorer, P. M. Grundy, and C. A. B. Smith, “Some binary games,”
The Mathematical Gazette 28
(1944),
96-103.
262 J.
SedlBEek,
“On the skeletons of a graph or digraph,” in Combinatorial
Structures and their Applications, Gordon and Breach, 1970, 387-391.
(This volume contains proceedings of the Calgary International Confer-
ence of Combinatorial Structures and their Applications, 1969.)
263 J. 0. Shallit, “Problem 6450: Two series,” American Mathematical
Monthly 92
(1985),
513-514.
264 R. T. Sharp, “Problem 52: Overhanging dominoes,” Pi Mu Epsilon Jour-
nal 1,10
(1954),
411-412.
265 W. Sierpiliski, “Sur la valeur asymptotique d’une certaine somme,” Bul-
letin International
AcadCmie
Polonaise des Sciences et des Lettres (Cra-
covie),
series A
(1910),
9-11.
266 W. Sierpiriski, “Sur les nombres dent la somme de diviseurs est une
puissance du nombre 2,” Calcutta Mathematical Society Golden Jubilee
Commemorative Volume
(1958-1959),
part 1, 7-9.
267 Wadaw Sierpiriski, A Selection of Problems in the Theory of Numbers.
Macmillan, 1964.
268 David L. Silverman, “Problematical Recreations 447: Numerical links,”
Aviation Week & Space Technology 89,lO (1 September
1968),
71. Re-
printed as Problem 147 in Second Book of Mathematical
Bafflers,
edited
by Angela Fox Dunn, Dover, 1983.
598 BIBLIOGRAPHY
271 A. D. Solov’ev, “Odno kombinatornoe tozhdestvo i ego primenenie k
zadache o pervom nastuplenii redkogo
sobytii%,”
Teorilla
verol^atnosteY
i
eZ;
primenenil^a 11 (1966), 313-320. English translation, “A combina-
torial identity and its application to the problem concerning the first
occurrence of a rare event,” Theory of Probability and its Applications
11 (1966), 276-282.
272 William G. Spohn, Jr.,
“Can mathematics be saved?” Notices of the
American Mathematical Society 16 (1969), 890-894.
273 Richard P. Stanley, “Differentiably finite power series,” European Jour-
nal of Combinatorics
1.
(1980), 175-188.
274 Richard P. Stanley, “On dimer coverings of rectangles of fixed width,”
Discrete Applied Mathematics 12 (1985), 81-87.
275 Richard P. Stanley, Enumerative Combinatorics, volume 1. Wadsworth
&
Brooks/Cole, 1986.
276 K.G.C. von Staudt, “Beweis eines Lehrsatzes, die Bernoullischen
Zahlen betreffend,” Journal fiir die reine und angewandte Mathematik
21 (1840), 372-374.
277 Guy L. Steele Jr., Donald R. Woods, Raphael A. Finkel, Mark R. Crispin,
Richard M. Stallman, and Geoffrey S. Goodfellow, The Hacker’s Dictio-
nary: A Guide to the World of Computer Wizards. Harper
&
Row,
1983.
278 J. Steiner, “Einige Gesetze iiber die Theilung der Ebene und des
Raumes,” Journal fiir die reine und angewandte Mathematik 1
(1826),
349-364. Reprinted in his Gesammelte Werke, volume 1, 77-94.
279 M.A. Stern,
“Ueber
eine zahlentheoretische Funktion,” Journal fiir die
reine und angewandte Mathematik 55 (1858), 193-220.
280 L. Stickelberger,
“Ueber
eine Verallgemeinerung der Kreistheilung,”
Mathematische Annalen 37 (1890), 321-367.
281 James Stirling, Methodus Differentialis. London, 1730. English transla-
tion, The Differential Method, 1749.
282 Dura W. Sweeney, “On the computation of Euler’s constant,” Mathe-
matics of Computation 17 (1963), 170-178.
283 J. J. Sylvester, “Problem 6919,” Mathematical Questions with their So-
lutions from the ‘Educational Times’ 37 (1882), 42-43, 80.
284 J. J. Sylvester, “On the number of fractions contained in any ‘Farey se-
ries’ of which the limiting number is given,” The London, Edinburgh
and Dublin Philosophical Magazine and Journal of Science, series 5, 15
(1883), 251-257. Reprinted in his Collected Mathematical Papers, vol-
ume 4, 101-109.
V.
b-05.
604.
519, 604, 605.
604.
124.
5. 602
116.
602.
192, 244, 283.
467.
602.
133.
B BIBLIOGRAPHY 599
510.
131.
604.
603.
383, 384.
605.
605.
266.
602.
169, 603.
484, 602.
604.
604.
604.
604.
279.
284’ M. Szegedy, “The solution of Graham’s greatest common divisor prob-
lem,” Combinatorics 6 (1986), 67-71.
285 Jonathan W. Tanner and Samuel S. Wagstaff, Jr., “New congruences for
the Bernoulli numbers,” Mathematics of Computation 48 (1987), 341-
350.
286 S. Tanny, “A probabilistic interpretation of Eulerian numbers,” Duke
Mathematical Journal 40 (1973), 717-722.
287 L. Theisinger, “Bemerkung iiber die harmonische Reihe,” Monatshefte
fiir Mathematik und Physik 26 (1915), 132-134.
288 T. N. Thiele, The Theory of Observations. Charles
&
Edwin Layton,
London, 1903. Reprinted in The Annals of Mathematical Statistics 2
(1931), 165-308.
289 E. C. Titchmarsh, The Theory of the Riemann Zeta-Function. Clarendon
Press, Oxford, 1951; second edition, revised by D. R. Heath-Brown, 1986.
290 F. G. Tricomi and A. ErdClyi,
“The asymptotic expansion of a ratio of
gamma functions,” Pacific Journal of Mathematics 1
(1951),
133-142.
291 Peter Ungar, “Problem E3052: A sum involving Stirling numbers,”
American Mathematical Monthly 94 (1987), 185-186.
292 J.V. Uspensky, “On a problem arising out of the theory of a certain
game,” American Mathematical Monthly 34 (1927), 516-521.
293 A. Vandermonde, “MCmoire sur des irrationnelles de diffkrens ordres
avec
une application au cercle,” Histoire de
1’AcadCmie
Royale des Sciences
(1772), part 1, 71-72; Mimoires de MathCmatique et de Physique,
TirCs
des Registres de
1’Acade’mie
Royale des Sciences (1772), 489-498.
294 J. Venn, “On the diagrammatic and mechanical representation of propo-
sitions and reasonings,” The London, Edinburgh and Dublin Philosoph-
ical Magazine and Journal of Science, series 5, 9 (1880), 1-18.
295 John Wallis, A Treatise of Angular Sections. Oxford, 1684.
296 Edward Waring, Meditationes
Algebrai’cze.
Cambridge, 1770; third edi-
tion, 1782.
296’ William C. Waterhouse, “Problem E 3117: Even odder than we thought,”
American Mathematical Monthly 94 (1987), 691-692.
297 Frederick V. Waugh and Margaret W. Maxfield, “Side-and-diagonal num-
bers,” Mathematics Magazine 40 (1967), 74-83.
298 Warren Weaver, “Lewis Carroll and a geometrical paradox,” American
Mathematical Monthly 45 (1938), 234-236.
600 BIBLIOGRAPHY
299 Louis Weisner,
“Abstra.ct
theory of inversion of finite series,”
‘Transac- 501:
tions of the American Mathematical Society 38 (1935), 474-484.
300 Hermann Weyl, “ober die Gibbs’sche Erscheinung und verwandte Kon-
87.
vergenzphtinomene,” Rendiconti
de1
Circolo Matematico di Palermo 30
(1910), 377-407.
301 F. J. W. Whipple,
“S0m.e
transformations of generalized hypergeometric
603.
series,” Proceedings of the London Mathematical Society, series 2, 26
(1927),
257-272.
302 Alfred North Whitehead, An Introduction to Mathematics. London and
489.
New York, 1911.
303 Alfred North Whitehead, “Technical education and its relation to science
91.
and literature,” chapter 2 in The Organization of Thought, Educational
and Scientific, London and New York, 1917. Reprinted as chapter 4 of
The Aims of Education and Other Essays, New York, 1929.
304 Alfred North Whitehead, Science and the Modern World. New York,
577.
1925. Chapter 2 reprinted in The World of Mathematics, edited by
James R. Newman, 1956, volume 1, 402-416.
304’ Herbert S. Wilf, generatingfunctionology. Academic Press, 1990.
603.
305 H. C. Williams and H. Dubner, “The primality of R1031,” Mathematics
602.
of Computation 47 (1986), 703-711.
306 J. Wolstenholme, “On certain properties of prime numbers,” Quarterly
604.
Journal of Pure and Applied Mathematics 5 (1862), 35-39.
307 Derick Wood, “The Towers of Brahma and Hanoi revisited,” Journal of
602.
Recreational Mathematics 14 (1981), 17-24.
308 J. Worpitzky, “Studien iiber die Bernoullischen und Eulerschen Zahlen,”
255.
Journal
fiir
die reine und angewandte Mathematik 94 (1883), 203-232.
309 E. M. Wright, “A prime,-representing function,” American Mathematical
602.
Monthly 58 (1951), 616-618; errata in 59 (1952), 99.
310 Hermann Zapf, collected works, entitled Hermann Zapf
&
His Design Phi-
viii.
losophy. Society of Typographic Arts, Chicago, 1987. (The AMS Euler
typeface is mentioned on pages 97 and 136.)
311 Derek A. Zave, “A series expansion involving the harmonic numbers,”
604.
Information Processing Letters 5 (1976), 75-77.
312 E. Zeckendorf, “ReprCsentation des nombres naturels par une somme de
281.
nombres de Fibonacci
ou
de nombres de Lucas,” Bulletin de la
SociCtC
Royale des Sciences de
.Li&ge
41 (1972), 179-182.
C
Credits for Exercises
The TA sessions
were invaluable,
I
mean really great.
Keep the same
instructor and the
same
TAs
next year.
C/ass notes
m
good and useful.
I never ‘got” Stir-
ling numbers.
THE EXERCISES in this book have been drawn from many sources. The
authors have tried to trace the origins of all the problems that have been
published before, except in cases where the exercise is so elementary that its
inventor would probably not think anything was being invented.
Many of the exercises come from examinations in Stanford’s Concrete
Mathematics classes The teaching assistants and instructors often devised
new problems for those exams, so it is appropriate to list their names here:
Year Instructor
1970 Don Knuth
1971 Don Knuth
1973
Don Knuth
1974 Don Knuth
1975 Don Knuth
1976 Andy Yao
1977 Andy Yao
1978 Frances Yao
1979
Ron Graham
1980 Andy Yao
1981 Ron Graham
1982 Ernst Mayr
1983 Ernst Mayr
1984
Don Knuth
1985 Andrei Broder
1986 Don Knuth
Teaching Assistant(s)
Vaughan Pratt
Leo Guibas
Henson
Graves, Louis Jouaillec
Scot Drysdale, Tom Porter
Mark Brown, Luis Trabb Pardo
Mark Brown, Lyle Ramshaw
Yossi Shiloach
Yossi Shiloach
Frank Liang, Chris Tong, Mark Haiman
Andrei Broder, Jim McGrath
Oren Patashnik
Joan Feigenbaum, Dave Helmbold
Anna Karlin
Oren Patashnik, Alex
Schaffer
Pang Chen, Stefan Sharkansky
Arif Merchant, Stefan Sharkansky
In addition, David Klarner (1971), Bob Sedgewick (1974), Leo Guibas (1975),
and Lyle Ramshaw (1979) each contributed to the class by giving six or more
guest lectures. Detailed lecture notes taken each year by the teaching assis-
tants and edited by the instructors have served as the basis of this book.
602 CREDITS FOR EXERCISES
1.1
1.2
1.5
1.6
1.8
1.9
1.10
1.11
1.14
1.17
1.21
1.22
1.23
1.25
2.2
2.3
2.5
2.22
2.23
2.26
2.29
2.30
2.34
2.35
2.36
2.37
3.6
3.8
3.9
3.12
3.13
3.19
3.21
3.23
3.28
3.30
3.31
3.32
3.33
Polya
[238,
p.
1201.
Scorer, Grundy, and Smith
[261].
Venn
[294].
Steiner
[278];
Roberts
[251].
Lyness
[209].
Cauchy
[47,
note 2, theorem
171.
Atkinson
[13].
Inspired by Wood
[30’7].
Steiner
[278];
Polya
[238,
chapter
31;
Brother Alfred
[37].
Dudeney
[72,
puzzle
11.
Ball
[16]
credits B. A. Swinden.
Based on an idea of Peter Shor.*
Bjorn Poonen.*
Frame, Stewart, and Dunkel
[105].
Iverson
[161,
p.
111.
[173,
exercise 1.2.3-21.
[173,
exercise 1.2.3-251.
Cauchy
[47,
note 2, theorem
161.
1982 final.
[173,
exercise 1.2.3-261.
1979 midterm.
1973 midterm.
Riemann
[250,
section
31.
Euler
[85]
gave a fallacious “proof”
using divergent series.
Golomb
[120];
Ilan Vardi.*
Leo Moser.*
Ernst Mayr, 1982 homework.
Dirichlet
[67].
Chace
[48];
Fibonacci
[98,
pp. 77-831.
[173,
exercise 1.2.4-48(a)].
Beatty
[18];
Niven [224, theorem 3.71.
[173,
exercise 1.2.4-341.
1975 midterm.
[173,
exercise 1.2.4-411.
Brown
[40].
Aho
and Sloane
[4].
Greitzer
[135,
problem 1972/3,
solution
21.
[130].
1984 midterm.
3.34
3.35
3.36
3.37
3.38
3.39
3.40
3.41
3.42
3.45
3.46
3.48
3.51
3.52
4.4
4.16
4.19
4.21
4.22
4.23
4.24
4.26
4.31
4.36
4.37
4.38
4.39
4.40
4.41
4.42
4.44
4.45
4.47
4.48
4.52
4.53
4.54
4.56
1970 midterm.
1975 midterm.
1976 midterm.
1986 midterm;
[181].
1974 midterm.
1971 midterm.
1980 midterm.
Klamkin
[169,
problem 1978/3].
Uspensky
[292].
Aho
and Sloane
[4].
Graham and Pollak
[132].
R. L. Graham and D. R. Hofstadter.*
Fraenkel [
1031.
S.
K. Stein.*
[180,
$5261.
Sylvester
[283].
Bertrand
[23,
p.
1291;
Chebyshev
[50];
Wright
[309].
[178,
pp. 148-1491.
Brillhart
[34];
Williams and
Dub-
ner
[305].
Crowe
[58].
Legendre
[196,
second edition,
introduction].
[174,
exercise 4.5.3-431.
Pascal
[226].
Hardy and Wright
[150,
$14.51.
Aho
and Sloane
[4].
Lucas
[205].
[129].
Stickelberger
[280].
Legendre
[196,
$1351;
Hardy and
Wright
[150,
theorem
821.
[174,
exercise 4.5.1-61.
[174,
exercise 4.5.3-391.
[174,
exercise 4.3.2-131.
Lehmer
[197].
Gauss
[115,
$781;
Crelle
[57].
1974 midterm.
1973 midterm, inspired by Rao
[244].
1974 midterm.
Logan
[202,
eq. (6.15)].
4.57
4.58
4.59
4.60
4.61
4.63
4.64
4.66
4.67
4.69
4.70
4.71
4.72
4.73
5.1
5.3
5.5
5.13
5.14
5.15
5.21
5.25
5.28
5.29
5.31
5.34
5.36
5.37
5.38
5.40
5.43
5.48
5.49
5.53
5.58
5.59
5.60
5.61
5.62
A special case appears in
[182].
5.63
1974 midterm.
Sierpinski
[266].
5.64
1980 midterm.
Curtiss
[59];
Erdijs
[76].
5.65
1983 midterm.
Mills
[216].
5.66
1984 midterm.
[173,
exercise 1.3.2-191.
5.67
1976 midterm.
Barlow
[17];
Abel [l].
5.68
1985 midterm.
Peirce
[229].
5.69
Lyle Ramshaw, guest lecture in 1986.
Ribenboim
[249];
Sierpinski
[267:
5.70
Andrews
[Q,
theorem
5.41.
problem P&l.
5.71
H. S. Wilf [304’, exercise
4.161.
[127].
5.72
Hermite
[154].
Cramer
[56].
5.74
1979 midterm.
P. Erdos.*
5.75
1971 midterm.
[77,
p.
961.
5.76
[173,
exercise 1.2.6-59 (corrected)].
[77,
p.
1031.
5.77
1986 midterm.
Landau
[195,
volume 2, eq.
6481.
5.78
[176].
Forcadel
[loll.
5.79
Mendelsohn
[215];
Montgomery
[218].
Long and Hoggatt
[203].
5.81
1986 final exam.
1983 in-class final.
5.82
Hillman and Hoggatt
[157].
1975 midterm.
5.85
Hsu
[159].
[173,
exercise 1.2.6-201.
5.86
Good
[123].
Dixon
[68].
5.88
Hermite
[155].
Euler
[81].
5.91
Whipple
[301].
Gauss
[116,
$71.
5.92
Clausen
[51],
[52].
Euler
[95].
5.93
Gosper
[124].
Kummer
[187,
eq.
26.41.
5.94
Henrici
[152,
p.
1181.
Gosper
[124].
5.95
[77,
p.
711.
Bailey
[15,
$10.41.
5.96
[77,
p.
711.
Kummer [188,
p.
1161.
5.97
R. William Gosper, Jr.*
Vandermonde
[293].
6.6
Fibonacci
[98,
p.
2831.
[173,
exercise 1.2.6-161.
6.15
[175,
exercise 5.1.3-21.
Rodseth
[252].
6.21
Theisinger
[287].
Pfaff
[233];
Saalschiitz
[256];
6.25
Gardner
[112]
credits Denys Wilquin.
[173,
exercise 1.2.6-311.
6.27
Lucas
[205].
Ranjan
Roy. *
6.28
Lucas
[207,
chapter
181.
Roy [255, eq.
3.131.
6.31
Lah
[193];
R. W. Floyd.*
Gauss
[116];
Richard
Askey.*
6.35
1977 midterm.
Frazer and McKellar
[107].
6.37
Shallit
[263].
Stanford Computer Science Compre-
6.39
[173,
exercise 1.2.7-151.
hensive Exam, Winter 1987.
6.40
Klamkin
[169,
problem 1979/l].
[173,
exercise 1.2.6-411.
6.41
1973 midterm.
Lucas
[206].
6.43
Brooke and Wall
[36].
1971 midterm.
6.44
Matiiasevich
[213].
C CREDITS FOR EXERCISES
603
604 CREDITS FOR EXERCISES
6.46
6.47
6.48
6.49
6.50
Francesca
[106];
Wallis
[295,
chap-
ter
41.
Lucas
[205].
[174,
exercise 4.5.3-9(c)].
Davison
[61].
6.51
6.52
6.53
6.54
6.55
6.56
6.57
6.58
6.59
6.61
6.62
6.63
1985 midterm; Rham
[248];
Dijk-
stra
[66,
pp. 230-2321.
Waring
[296];
Lagrange
[191];
Wol-
stenholme
[306].
Eswarathasan and Levine
[79].
Kaucky
[168]
treats a special case.
Staudt
[276];
Clausen
[53];
Rado
[242].
Andrews and Uchimura [
121.
1986 midterm.
1984 midterm, suggested by R. W.
Floyd.*
[173,
exercise 1.2.8-301; 1982 midterm.
Burr
[42].
6.65
6.66
6.67
6.70
6.72
6.73
1976 final exam.
Borwein and Borwein
[31,
Fj3.71.
[173,
section 1.2.101; Stanley [275,
proposition 1.3.121.
Tanny
[286].
6.74
6.75
6.76
6.78
6.79
6.80
6.81
6.84
6.85
6.87
Logan [202’].
[175,
exercise 6.1-131.
Euler
[88,
part 2, chapter
81.
[175,
exercise 5.1.3-31.
Euler
[86,
chapters 9 and lo];
Schroter
[260].
Logan [202’].
Comic section, Boston Herald,
August 21, 1904.
Silverman and Dunn
[268].
[183].
[126],
modulo a numerical error.
[174,
exercises 4.5.3-2 and
31.
Adams and Davison
[3].
Lehmer
[198].
Burr
[42].
Part (a) is from Eswarathasan and
Levine [
791.
7.2
[173,
exercise 1.2.9-l].
7.8
7.9
7.11
7.12
7.13
7.15
7.16
7.20
7.22
7.23
7.24
7.25
7.26
7.32
7.33
7.34
7.36
7.37
7.38
7.39
7.41
7.42
7.44
7.45
7.47
7.48
7.49
7.50
7.51
7.52
7.53
7.54
7.55
7.56
7.57
Zave
[311].
[173,
exercise 1.2.7-221.
1971 final exam.
[175,
pp. 63-641.
Raney
[243].
Bell
[20].
Polya
[237,
p.
1491;
[173,
exercise
2.3.4.4-l].
Jungen
[167,
p.
2991
credits A.
Hurwitz.
Polya
[239].
1983 homework.
Myers
[222];
SedlaEek
[262].
[174,
Carlitz’s proof of lemma 3.3.3B].
[173,
exercise 1.2.8-121.
[77,
pp. 25-261 credits L. Mirsky and
M. Newman.
1971 final exam.
Tom&
Feder . *
1974 final exam.
Euler
[87,
$501;
1971 final exam.
1973 final exam.
[173,
exercise 1.2.9-181.
Andre
[8];
[175,
exercise 5.1.4-221.
1974 final exam.
Gross
[136];
[175,
exercise 5.3.1-31.
de Bruijn
[63].
Waugh and Maxfield
[297].
1984 final exam.
Waterhouse [296’].
Schroder
[259];
[173,
exercise 2.3.4.4-
311.
Fisher
[99];
Percus
[232,
pp. 89-1231;
Stanley
[274].
Hammersley
[
1461.
Euler
[92,
part 2, section 2, chapter 6,
$911.
Moessner
[217].
Stanley
[273].
Euler
[91].
[77,
p.
481
credits P. Erdos and
P. Turan.
8.13
8.15
8.17
8.24
8.26
8.27
8.29
8.32
8.34
8.35
8.36
8.38
8.39
8.41
8.43
8.44
8.46
8.47
8.48
8.49
8.50
8.51
8.53
8.57
8.63
9.1
9.2
9.3
9.6
9.8
9.9
9.14
9.16
9.18
9.20
9.24
9.27
9.28
Thomas M. Cover.*
[173,
exercise 1.2.10-171.
Patil
[228].
John Knuth (age 4) and DEK;
1!375
final.
[173,
exercise 1.3.3-181.
Fisher [loo].
Guibas and Odlyzko
[138].
1977 final exam.
Hardy
[149]
has an incorrect analysis
leading to the opposite conclusion.
1981 final exam.
Gardner
[113]
credits George Sicher-
man.
[174,
exercise 3.3.2-101.
[177,
exercise 4.3(a)].
Feller
[96,
exercise 1X.331.
[173,
sections 1.2.10 and
1.3.31.
1984 final exam.
Feller
[96]
credits Hugo Steinhaus.
1974 final, suggested by “fringe
analysis” of 2-3 trees.
1979 final exam.
Blom
[26];
1984 final exam.
1986 final exam.
1986 final exam.
Feller
[96]
credits S. N. Bernstein.
Lyle Ramshaw.*
Guibas and Odlyzko
[138].
Hardy
[148,
1.3(g)].
Part (c) is from Garfunkel [114].
[173,
exercise 1.2.11.1-61.
[173,
exercise 1.2.11.1-31.
Hardy
[148,
1.2(iv)].
Landau
[194,
vol. 1, p.
601.
[173,
exercise 1.2.11.3-61.
Knopp [170, edition 3 2, §64C].
Bender
[21,
$3.11.
1971 final exam.
[134,
54.1.61.
Titchmarsh
[289].
[173,
exercise 1.2.11.2-71.
9.29
9.32
9.34
9.35
9.36
9.37
9.38
9.39
9.40
9.41
9.42
9.44
9.46
9.47
9.48
9.49
9.50
9.51
9.52
9.53
9.57
9.58
9.60
9.62
9.63
9.65
9.66
9.67
C CREDITS FOR EXERCISES 605
de Bruijn
[62,
section
3.71.
1976 final exam.
1973 final exam.
1975 final exam.
1980 class notes.
[174,
eq. 4.5.3-211.
1977 final exam.
1975 final exam, inspired by
Reich
[247].
1977 final exam.
1980 final exam.
1979 final exam.
Tricomi and ErdClyi
[290].
de Bruijn
[62,
$6.31.
1980 homework; [175, eq. 5.3.1-341.
1980 final exam.
1974 final exam.
1984 final exam.
[134,
$4.2.11.
Poincare
[235];
Bore1
[30,
p.
271.
Polya and
SzegG
[240, part 1, problem
1401.
Andrew M. Odlyzko.*
Henrici
[151,
exercise 4.9.81.
Ilan Vardi.*
Canfield
[43].
Ilan Vardi.*
M. P. Schutzenberger.*
Lieb [201’]; Stanley [275, exercise
4.37(c)].
Boas and Wrench
[27].
* Unpublished personal communication.
Index
WHEN AN INDEX ENTRY refers to a page containing a relevant exercise, the
answer to that exercise (in Appendix A) might divulge further information; an
(Graffiti
have been
answer page is not indexed here unless it refers to a topic that isn’t included
indexed too.)
in the statement of the relevant exercise.
Aaronson, Bette Jane, ix.
Abel, Niels Henrik, 578, 603.
Abramowitz, Milton, 42, 578.
Absolute convergence, 60-61, 64.
Absolute error, 438, 441.
Absolute value of complex number, 64.
Absorption identities, 157-158, 247.
Acton,
John Emerich Edward Dalberg,
baron, 66.
Adams, William Wells, 578, 604.
Addison-Wesley, ix.
Addition formula, 158-159,
2,45,
247.
Aho,
Alfred Vaino, 578, 602.
Ahrens, Wilhelm Ernst Martm Georg, 8,
578, 602.
Akhiezer, Naum Il’ich, 578.
Alfred [Brousseau], Brother Ulbertus, 580,
602.
Algebraic integers, 147.
Algorithms, analysis of, 138, ,399-412.
divide and conquer, 79.
Euclid’s, 103, 123, 289-290.
Fibonacci’s, 95, 101.
Gosper’s, 224-226, 519.
greedy, 101, 281.
self-certifying, 104.
Alice, 31, 394-396, 416.
Allardice, Robert Edgar, 2, 5’78.
606
American Mathematical Society, viii.
AMS Euler, ix, 625.
Analysis of algorithms, 138, 399-412.
Analytic functions, 196.
Ancestor, 117, 277.
Andre, Antoine Desire, 578, 604.
Andrews, George W. Eyre, 215, 316, 515,
579, 603, 604.
Answers, notes on, viii, 483, 606.
Anti-difference operator, 48, 54, 456-457.
Approximation, 8, 76, 87-89, 110, 114,
425-482.
of sums by integrals, 45, 262-263,
455-461.
Archibald, Raymond Clare, 581.
Argument of hypergeometric, 205.
Arithmetic progression, 26, 30, 362.
Armageddon, 85.
Armstrong, Daniel Louis (= Satchmo), 80.
Ascents, 253-254, 256.
Askey,
Richard Allen, 603.
Associative law, 30, 61, 64.
Asymptotics, 8, 76, 110, 114, 425-482.
for sums, 87-89, 452-482.
Atkinson, Michael David, 579, 602.
Austin, A. K., 581.
Automaton, 391.
Automorphic numbers, 505.
608
INDEX
Bois-Reymond, Paul David Gustav du, 426,
580, 589.
Boncompagni, Prince Baldassarre, 585.
Bootstrapping, 449-452.
Borchardt, Carl Wilhelm, 589.
Borel, Emile Felix Edouard Justin, 580, 605.
Borwein, Jonathan Michael, 580, 604.
Borwein, Peter Benjamin, 5130, 604.
Bound variables, 22.
Boundary conditions, 24-25, 75, 86, 159.
Bowling, 6.
Box principle, 95, 130, 497.
Brahma, Tower of, 1, 4, 264.
Brent, Richard Peirce, 292,
!jlO,
540, 580.
Bricks, 299, 360.
Brillhart, John David, 580, 602.
Brocot, Achille, 116, 580.
Broder, Andrei Zary, ix, 601.
Brooke, Maxey, 580, 603.
Brousseau, Brother Alfred, 580, 602.
Brown, Mark Robbin, 601.
Brown, Morton, 487, 580.
Brown, Roy Howard, ix.
Brown, Thomas Craig, 581, 602.
Brown, Trivial, 581.
Brown, William Gordon, 344, 581.
Brown University, ix.
Browning, Elizabeth Barrett, 306.
Bubblesort, 434.
Buckholtz, Thomas Joel, 593..
Burr, Stefan Andrus, 581, 604.
Calculators, 67, 330.
Calculus, vi, 33.
finite and infinite, 47-56.
Candy, 36.
Canfield, Earl Rodney, 577,
!j81,
605.
Cards, shuffling, 423.
stacking, 259-260, 295.
Carlitz, Leonard, 604.
Carroll, Lewis (= Dodgson, Rev. Charles
Lutwidge), 31, 279, 581, 582, 599.
Carry, 70, 233, 283, 537.
Cassini, Jean Dominique, 278, 581.
identity, 278-279, 286, 289, 296, 300.
Catalan, Eugene Charles, 203, 347, 581.
Catalan numbers, 181, 203, 303.
combinatorial interpretations, 344-346,
541.
generalized, 347.
table of identities, 203.
Cauchy,
Augustin Louis, 581, 602.
inequality, 64.
Tech,
Eduard, vi.
Ceiling function, 67-69.
Center of gravity, 259-260.
Certificate of correctness, 104.
Chace, Arnold Buffum, 581, 602.
Chaimovich, M., 581.
Chain rule, 54, 469.
Change, 313-316, 360.
large amounts of, 330-332, 478.
Changing the index of summation, 30-31,
39.
Changing the tails of a sum, 452-455.
Cheating, viii, 158, 309, 374, 387.
Chebyshev, Pafnuti? L’vovich, 38, 145, 581,
602.
inequality, 376-377, 414, 416, 555.
summation inequalities, 38.
Cheese slicing, 19.
Chen, Pang-Chieh, 601.
Chinese Remainder Theorem, 126, 146.
Chu Shih-Chieh, 169.
Chung, Fan-Rong King, ix.
Clausen, Thomas, 582, 603, 604.
product identities, 241.
Clearly, clarified, 403, 556.
Cliches, 166, 310.
Closed form, 3, 7, 108, 317, 548.
Closed interval, 73-74.
Cobb, Tyrus Raymond, 195.
Coins, 313-316.
biased, 387.
fair, 387, 416.
flipping, 387-396.
spinning, 387.
Collingwood, Stuart Dodgson, 279, 582.
Collins, John, 594.
Colombo, Cristoforo (= Columbus, Christo-
pher), 74.
Colors, 482.
Columbia University, ix.
Combinations, 153.
Common logarithm, 435.
Commutative law, 30, 61, 64, 308.
relaxed, 31.
Complete graph, 354.
Complex factorial powers, 211.
Complex numbers, 64.
roots of unity, 149, 204, 361, 530, 550, 572.
Composite numbers, 105.
Composition of generating functions, 41.4.
Concrete Math Club, 74.
Concrete mathematics, defined, vi.
Conditional convergence, 59.
Conditional probability, 402-405, 410-411.
Confluent hypergeometric series, 206.
Congruences, 124-126.
Connection Machine, 131.
Contiguous hypergeometrics, 514.
Continuants, 287-295, 298, 300, 487.
Continued fractions, 287, 290-295, 304, 540.
Convergence, 206, 317, 517.
absolute, 60-61, 64.
conditional, 59.
Convex regions, 5, 20, 483.
Convolution, 197, 319, 339-350.
binomial, 351, 353.
identities for, 202, 258.
Conway, John Horton, 396, 566, 582.
Cotangent function, 272, 303.
INDEX 609
Counting, combinations, 153.
cycle arrangements, 247-248.
derangements, 193-196, 199-200.
with generating functions, 306-316.
integers in intervals, 73-74.
necklaces, 139-141.
parenthesized formulas, 343-345.
permutations, 111, 253-254.
set partitions, 245.
spanning trees, 335, 354.
Coupon collecting, 558.
Cover, Thomas Merrill, 605.
Coxeter, Harold Scott Macdonald, 579.
Cramer, Carl Harald, 510, 582, 603.
Cray X-MP, 109.
Crelle, August Leopold, 582, 602.
Cribbage, 65.
Crispin, Mark Reed, 598.
Crowe, Donald Warren, 582, 602.
Crudification, 433.
Cubes, sum of consecutive, 51, 63, 269, 275,
353.
Cumulants, 383-387, 414, 415, 424.
CUNY (= City University of New York), ix.
Curtiss, David Raymond, 582, 603.
Cycles, 139, 245, 248, 486.
Cyclic shift, 12.
Cyclotomic polynomial, 149.
6, see Finite calculus.
A, see Difference operator.
D, see Derivative operator.
David, Florence Nightingale, 577, 582.
Davison, John Leslie, 293, 578, 582, 604.
de Branges, Louis, 589.
de Bruijn, Nicolaas Govert, 430, 433, 486,
582, 604, 605.
cycle, 486.
de Moivre, Abraham, 283, 467, 582.
Definite sums, analogous to definite inte-
grals, 49-50.
610
INDEX
Degenerate hypergeometric series, 210, 216,
222, 235.
Derangements, 193-196, 199-200, 379-380,
386-387, 414.
Derivative operator, 33, 47, 1191, 219-221,
296, 319, 350-351, 456-457.
Descents, see Ascents.
dgf: Dirichlet generating function.
Dice, 367-370, 413, 415.
fair, 368, 403.
loaded, 368, 413.
nonstandard, 417.
supposedly fair, 378.
Dickson, Leonard Eugene, 496, 583.
Dieudonne, Jean Alexandre, 500.
Difference operator, 47-55, 456-457.
nth order, 187-192.
Differentiably finite power series, 360, 366.
Differential operators, see Derivative
operator and Theta operator.
Difficulty measure for summation, 181.
Dijkstra, Edsger Wybe, 173, 583, 604.
Dimers and dimes, 306, see Dominoes and
Change.
Diphages, 420, 424.
Dirichlet, Peter Gustav Lejeune, 356, 583,
602.
box principle, 95, 130, 497.
generating functions, 356-357, 359, 418,
437.
probability generating
func:tions,
418.
Discrepancy, 88-89, 97, 304, 478, 481.
Discrete probability, 367-424.
defined, 367.
Disease, 319.
Distribution, of probabilities, 367.
of things into groups, 83-8.5.
Distributive law, 30, 35, 60, 64, 83.
Divergent sums, 60, 334, 517.
Divide and conquer, 79.
Divides exactly, 112-114, 146, 233.
Divisibility, 102-105.
of polynomials, 225.
Dixon, Alfred Cardew, 583, 603.
formula, 214.
DNA, Martian, 363.
Dodgson, Charles Lutwidge, see Carroll.
Dominoes, 306-313, 357.
Double sums, 34-41, 105, 237.
Doubly exponential recurrences, 97, 100,
101, 109.
Doubly infinite sums, 59, 98, 468-469.
Dougall, John, 171, 583.
Downward generalization, 2, 95, 306-307.
Doyle, Sir Arthur Conan, 162, 227-228, 391,
583.
Drones, 277.
Drysdale, Robert Lewis (Scot), III, 601.
du Bois-Reymond, Paul David Gustav, 426,
580, 589.
Duality, 63 (exercise 17), 68-69, 253, 515.
Dubner, Harvey, 600, 602.
Dudeney, Henry Ernest, 583, 602.
Dunkel, Otto, 586, 602.
Dunn, Angela Fox, 597, 604.
Dunnington, Guy Waldo, 583.
Duplication formulas, 186, 232.
Dupre, Lyn Oppenheim, ix.
Durst, Lincoln Kearney, viii.
Dyson, Freeman John, 172, 587.
e, 70, 122, 570.
E, 55, 188, 191.
Edwards, Anthony William Fairbank, 583.
Eeny-meeny-miny-mo, see
Josephus
prob-
lem.
Efficiency, 24.
egf: Exponential generating function.
Eggs, 158.
Egyptian mathematics, 95, 150, 581.
Einstein, Albert, 72, 293.
Eisele, Carolyn, 595.
Eisenstein, Ferdinand Gotthold Max, 202,
583.
Elementary events, 367-368.
Elkies, Noam David, 131.
Ellipsis
(...),
21, 50, 108.
Empirical estimates, 377-379, 413.
Empty case, 2, 244, 306-307, 335, 541.
Empty product, 48, 106.
Empty sum, 23, 48.
Entier function, see Floor function.
Equality, one-way, 432-433.
Equivalence relation, 124.
Eratosthenes, sieve of, 111.
Erdelyi, Arthur, 599, 605.
ErdBs,
Pal (= Paul), 510, 526, 550, 583-584,
603, 604.
Error, absolute versus relative, 438, 441.
Error function, 166.
Eswarathasan, Arulappah, 584, 604.
Euclid (= E~I&LS~~), 107-108, 584.
algorithm, 103-104, 123, 289-290.
numbers, 108, 145, 150, 151.
Euler, Leonhard, i, vii, ix, 6, 48, 122, 131,
133, 134, 205, 207, 210, 232, 253, 263,
264, 272, 285, 287, 289, 455, 457, 499,
514, 550, 577, 579, 584-585, 602-604.
constant, 264, 292, 304, 467.
identity for hypergeometrics, 233.
numbers, 535, 591; see also Eulerian
numbers.
polynomials, 549.
summation formula, 455-461.
theorem, 133, 141, 147.
totient function, 133-135, 137-144, 357,
448-449.
triangle, 254, 303.
Eulerian numbers, 253-257, 296, 302, 364,
550.
combinatorial interpretations, 253-254, 534.
generalized, 299.
generating function for, 337.
second-order, 256-257.
INDEX 611
Event, 368.
Eventually positive function, 428.
Exact cover, 362.
Exactly divides, 112-114, 146, 233.
Excedances, 302.
Exercises, levels of, viii, 72-73, 95, 497.
exp: Exponential function, 441.
Expectation, see Expected value.
Expected value, 371-373, 381.
Exponential function, discrete analog of, 54.
Exponential generating functions, 350-355,
407-408.
Exponential series, generalized, 200-202,
231, 350, 355.
Exponents, law of, 52.
4,
see Phi.
cp,
see Euler’s totient function.
Factorial expansion of binomial coefficients,
156.
Factorial function, 111-115, 332-334.
approximation to, see Stirling’s formula.
duplication formula, 232.
generalized to nonintegers, 192, 210-211,
213-214, 302.
Factorial powers, 47-48, 63, 248.
complex, 211.
negative, 52-53, 63.
related to ordinary powers, 248-249, 572.
Factorization into primes, 106-107, 110.
Factorization of summation conditions, 36.
Fair coins, 387, 416.
Fair dice, 368, 403.
Falling factorial powers, 47.
complex, 211.
difference of, 48, 53.
negative, 188.
related to ordinary powers, 51, 248-249,
572.
related to rising powers, 63, 298.
Fans, ix, 193, 334.
Farey, John, series, 118-119, 134, 137, 150,
152, 448, 588.
612
INDEX
Feder,
Tom&s,
604.
Feigenbaum, Joan, 601.
Feller, William, 367, 585,
6O!j.
Fermat, Pierre de, 130, 131, 585.
numbers, 131-132, 145, 510.
Fermat’s Last Theorem, 130, 150, 509, 532.
Fermat’s theorem (= Fermat’s Little
Theorem), 131, 141, 149.
converse of, 148.
Fibonacci, Leonardo, 95, 278, 527, 585, 602,
603.
algorithm, 95, 101.
factorial, 478.
number system, 282-283, 287, 293, 296,
303.
odd and even, 293-294.
Fibonacci numbers, 276-287, 288, 307, 317.
combinatorial interpretations of, 277, 278,
288, 307.
generating function for, 283-285, 323-326,
337.
second-order, 361.
Fine, Henry Burchard, 595.
Fine, Nathan Jacob, 577.
Finite calculus, 47-56.
Finite state language, 391.
Finkel, Raphael Ari, 598.
Fisher, Michael Ellis, 585, 604.
Fisher, Sir Ronald Aylmer, 586, 605.
Fixed point, 12, 379-380, 386-387, 414.
Floor function, 67-69.
Floyd, Robert W, 603, 604.
Food, see Candy, Cheese, Eggs, Pizza,
Sherry.
Football, 182.
Football victory problem, 193196, 199-200,
414.
generalized, 415.
mean and variance, 379-380, 386-387.
Forcadel, Pierre, 586, 603.
Formal series, 206, 317, 517.power
FORTRAN, 432.
Fourier, Jean Baptiste Joseph, 22, 586.
series, 481.
Fractional part, 70, 83, 87, 456.
Fractions, 116-123, 151.
basic, 134, 138.
continued, 287, 290-295, 304, 540.
partial, see Partial fraction expansions.
unit, 95, 150.
unreduced, 134-135, 151.
Fraenkel, Aviezri S, 500, 535, 586, 602.
Frame, James Sutherland, 586, 602.
Francesca, Piero della, 586, 604.
Fraser, Alexander Yule, 2, 578.
Frazer, William Donald, 586, 603.
Fredman, Michael Lawrence, 499, 586.
Free variables, 22.
Freyman,
Grigoriy Abelevich, 581.
Friendly monster, 526.
Frisbees, 420-421, 423.
Frye, Roger Edward, 131.
Fundamental Theorem of Arithmetic,
106-107.
Fundamental Theorem of Calculus, 48.
Fuss,
NicolaX
Ivanovich, 347, 586.
Fuss-Catalan numbers, 347.
Fuss, Paul Heinrich von, 584.
y,
see Euler’s constant.
r, see Gamma function.
Gale, Dorothy, 556.
Games, see Bowling, Cards, Cribbage, Dice,
Penny ante, Sports.
Gamma function, 210-214, 468, 513.
Gardner, Martin, 586, 603, 605.
Garfunkel, J., 587, 605.
GauB
(= Gauss), Karl (= Carl) Friedrich,
vii, 6, 7, 123, 205, 207, 212, 496, 514,
583, 587, 602, 603.
identity for hypergeometrics, 222, 235.
trick, 6, 30, 112, 299.
gcd: Greatest common divisor.
Generalization, 11, 13, 16.
downward, 2, 95, 306-307.
Generalized binomial series, 200-204,
2:32,
240, 349.
Generalized exponential series, 200-202, 231,
350, 355.
Generalized factorial function, 192, 210-211,
213-214, 302.
Generalized harmonic numbers, 263, 269,
272, 297, 302, 356.
Generating functions, 196-204, 283-285,
306-366.
for Bernoulli numbers, 271, 337, 351.
for convolutions, 339-350, 355, 407.
Dirichlet, 356-357, 359, 418, 437.
for Eulerian numbers, 337.
exponential, 350-355.
for Fibonacci numbers, 283-285, 323--326,
337.
of generating functions, 337, 339, 407.
for harmonic numbers, 337-338.
Newtonian, 364.
for probabilities, 380-387.
for simple sequences, 321.
for Stirling numbers, 337, 407.
super, 339, 407.
Genocchi, Angelo, 587.
numbers, 528, 549.
Geometric progression, 32-33, 54, 114,
205-206.
Gessel, Ira Martin, 256, 587.
Gibbs, Josiah Willard, 599.
Gilbert, William Schwenck, 430.
Ginsburg, Jekuthiel, 587.
Glaisher, James Whitbread Lee, constant,
569.
God, 1, 293.
Goldbach, Christian, 584.
theorem, 66.
Golden ratio, 285.
Golf, 417.
INDEX 613
Golomb, Solomon Wolf, 446, 493, 587, 602.
self-describing sequence, 66, 481.
Good, Irving John, 587, 603.
Goodfellow, Geoffrey Scott, 598.
Gopinath, Bhaskarpillai, 487, 592.
Gordon, Peter Stuart, ix.
Gosper, Ralph William, Jr., 224, 487, 540,
587, 603.
algorithm, 224-226, 519.
algorithm, examples, 227-228, 233, 519.
goto,
considered harmful, 173.
Gottschalk, Walter Helbig, vii.
Graffiti, vii, ix, 59, 606.
Graham, Cheryl, ix.
Graham, Ronald Lewis, iii, iv, vi, ix, 102,
492, 582-584, 587-588, 598, 601, 602.
Grandi, Luigi Guido, 58, 588.
Graph, 334, 360.
Graves, William
Henson,
601.
Gravity, center of, 259-260.
Gray, Frank, code, 483.
Greatest common divisor, 92, 103-104, 107,
145.
Greatest integer function, see Floor func-
tion.
Greatest lower bound, 65.
Greed, 74, 373-374; see also Rewards.
Greedy algorithm, 101, 281.
Green, Research Sink, 581.
Greene, Daniel Hill, 588.
Greitzer, Samuel Louis, 588, 602.
Gross, Oliver Alfred, 588, 604.
Griinbaum, Branko, 484, 588.
Grundy, Patrick Michael, 597, 602.
Guibas, Leonidas Ioannis (= Leo John), 588,
601, 605.
Guy, Richard Kenneth, 500, 510, 588.
Haar, Alfred, vii.
Hacker’s Dictionary, 124, 598.
Haiman, Mark, 601.
Half-open interval, 73-74.
614
INDEX
Hall, Marshall, Jr., 588.
Halmos, Paul Richard, v, vi, 588.
Halphen, Georges Henri, 291, 588.
Halving, 79, 186-187.
Hamburger, Hans Ludwig, 566, 589.
Hammersley, John Michael, v, 589, 604.
Hanoi, Tower of, 1-4, 26-27, 109, 146.
variations 17-19.
on,
Hansen, Eldon Robert, 42, 589.
Hardy, Godfrey Harold, 111, 428, 589, 602,
605.
Harmonic numbers, 29, 258-268, 466.
analogous to logarithms, 53.
approximate values of, 262--264.
complex, 297, 302.
divisibility of, 297, 300,
304:.
generalized, 263, 269, 272, 297, 302, 356.
generating function for, 337-338.
second-order, 263, 266, 297, 529.
sums of, 41, 56, 265-268, 298-299, 302,
340-341.
Harmonic series, divergence of, 62, 262.
Harry, Matthew Arnold, double sum, 237.
Hashing, 397-412.
Hats, 193-196, 199-200, 379-:380,
386-387,
414, 415.
hcf, 103.
Heath-Brown, David Rodney, 599.
Heiberg, Johan Ludvig, 584.
Heisenberg, Werner Karl, 467..
Helmbold, David Paul, 601.
Henrici, Peter Karl Eugen, 318, 526, 576,
589, 603, 605.
Hermite, Charles, 524, 532, 589, 603.
Herstein, Israel Nathan, 8,
58!).
Hexagon property, 155, 230, 239.
Hillman, Abraham P, 589, 603.
Hoare, Charles
Antony
Richard, 28, 73, 589.
Hofstadter, Douglas Richard, 602.
Hoggatt, Verner Emil, Jr., 589, 593, 603.
Holden,
Edward Singleton, 595.
Holmboe, Berndt Michael, 578.
Holmes, Thomas Sherlock Scott, 162,
227-228.
Holomorphic functions, 196.
Horses, 17, 454, 489.
Hsu,
Lee-Tsch (= Lietz = Leetch)
Ching-Siur, 589, 603.
Hurwitz, Adolf, 604.
Hyperbolic functions, 271-272.
Hyperfactorial, 231, 477.
Hypergeometric series, 204-223.
degenerate, 210, 216, 222, 235.
differential equation for, 219-221.
partial sums of, 165-166, 223-230, 233.
transformations of, 216-223, 235, 241.
Hypergeometric terms, 224, 231, 233.
i, 22.
J: Imaginary part, 64.
Implicit recurrences, 136-138, 193-194, 270.
Indefinite summation, 48-49, 55-56, 161,
224-230.
Independent random variables, 370, 413, 423.
Index set, 22, 30, 61.
Index variable, 22, 34, 60.
Induction, 3, 7, 10-11, 17, 43.
backwards, 18.
basis of, 3, 306-307.
failure of, 550.
important lesson about, 494, 526.
Inductive leap, 4, 43.
Inequality, Cauchy’s, 64.
Chebyshev’s, 376-377, 414, 416, 555.
Chebyshev’s summation, 38.
Infinite sums, 56-62, 64.
Information retrieval, 397-399.
Inkeri, Kustaa, 509, 590.
INT function, 67.
Integer part, 70.
Integration, 45-46, 48, 319, 351.
by parts, 54, 458.
Interchanging the order of summation,
34-41, 105, 136, 183, 185.
Interpolation, 191-192.
Intervals, 73-74.
Invariant relation, 117.
Inverse modulo m, 125, 132, 147.
Inversion formulas, 136, 138, 192-193.
Irrational numbers, 87, 122-123.
Iverson, Kenneth Eugene, 24, 67, 590, 602.
convention, 24, 31, 34, 68, 75, 587.
Jacobi, Carl Gustav Jacob, 64, 590.
Jarden, Dov, 533, 590.
Jeopardy, 347.
Joint distribution, 370.
Jonassen, Arne Tormod, 590.
Jones, Bush, 590.
Josephus, Flavius, 8, 12, 19-20, 590.
numbers, 81, 97, 100.
problem, 8-17, 79-81, 95, 100, 144.
recurrence, generalized, 13-16, 79-81.
subset, 20.
Jouaillec, Louis Maurice, 601.
Jungen,
R., 590, 604.
Kafkaesque scenario, 260.
Kaplansky, Irving, 8, 589.
Karlin, Anna Rochelle, 601.
Kaucky, Josef, 590, 604.
Kellogg, Oliver Dimon, 582.
Kent, Clark (= Kal-El), 358.
Kernel functions, 356.
Ketcham, Henry King, 148.
Kilometers, 287, 296.
Kilroy, James Joseph, vii.
Kipling, Joseph Rudyard, 246.
Kissinger, Henry Alfred, 365.
Klamkin, Murray Seymour, 590, 602, 603.
Klarner, David Anthony, 601.
Knockout tournament, 418-419.
Knopp, Konrad, 590, 605.
Knuth, Donald Ervin, iii-vi, viii, ix, 102,
253, 397, 492, 531, 588, 590-591, 601,
605, 625.
numbers, 78, 97, 100.
INDEX
Gl:,
Knuth, John Martin, 605.
Knuth, Nancy Jill Carter, ix.
Kramp, Christian, 111, 591.
Kronecker, Leopold, delta notation, 24.
Kummer, Ernst Eduard, 206, 514, 591-592,
603.
formula for hypergeometrics, 213, 217.
Kurshan, Robert Paul, 487, 592.
A-notation, 65.
Lagny, Thomas Fantet de, 290, 592.
Lagrange (= de la Grange), Joseph Louis,
comte, 592, 604.
identity, 64.
Lah, Ivo, 592, 603.
Landau, Edmund Georg Hermann, 429, 434,
592, 603, 605.
Laplace, Pierre Simon, marquis de, 452, 580,
592.
Last but not least, 132, 455.
Law of Large Numbers, 377.
lcm: Least common multiple, 103.
Least common multiple, 103, 107.
Least integer function, see Ceiling function.
Least upper bound, 57, 61.
LeChiffre, Mark Well, 148.
Left-to-right maxima, 302.
Legendre, Adrien Marie, 548, 592, 602.
Lehmer, Derrick Henry, 592, 602, 604.
Leibniz, Gottfried Wilhelm, Freiherr von, vii,
168, 588, 593.
Lekkerkerker, Cornelius Gerrit , 593.
Levels of exercises, viii, 72-73, 95, 497.
Levine, Eugene, 584, 604.
Lexicographic order, 427.
lg: Binary logarithm, 70.
L’Hospital, Guillaume FranGois Antoine de,
marquis de Sainte Mesme, rule, 326,
382.
Liang, Franklin Mark, 601.
Lieb, Elliott Hershel, 593, 605.
Lies, and statistics, 195.
616
INDEX
Lincoln, Abraham, 387.
Lines in the plane, 4-8, 17, 19.
Little oh notation, 434.
In:
Natural logarithm, 262.
log: Common logarithm, 435.
Logan, Benjamin Franklin (= Tex), Jr., 273,
593, 602-604.
Logarithmico-exponential functions,
428-429.
Logarithms, 53-54, 70, 262, 435.
Long, Calvin Thomas, 593, 1603.
Lottery, 373-374, 422-423.
Lower index, 154.
Lower parameters, 205.
Loyd, Samuel, 536, 593.
Lucas,
Fraqois
Edouard Anatole, 1, 278,
593, 602-604.
numbers, 298, 302.
Lyness, Robert Cranston, 487, 593, 602.
Lytton, Edward George Earle Lytton
Bulwer, baron, v.
p,
see Mobius function.
Maclaurin, Colin, 455, 593.
MacMahon,
Maj. Percy Alexander, 140, 593.
MACSYMA, 42, 525.
Magic tricks, 279.
Mallows, Colin Lingwood,
4!)2.
Markov,
AndreT
Andreevich (the elder),
processes, 391.
Martian DNA, 363.
Mathematical induction, 3,
;‘,
10-11, 17, 43.
backwards, 18.
basis of, 3, 306-307.
failure of, 550.
important lesson about, 494, 526.
Mathews, Edwin Lee (= 41), 8, 21, 94, 105,
106, 329.
Matitisevich (= Matijasevich),
&r-ii:
(= Yuri)
Vladimirovich, 280,
593,,
603.
Maxfield, Margaret Waugh,
!599,
604.
Mayr, Ernst, ix, 601, 602.
McEliece, Robert James, 71.
McGrath, James Patrick, 601.
McKellar, Archie Charles, 586, 603.
Mean (average) of a probability distribution,
370-381.
Median, 370, 371, 423.
Mediant, 116.
Meleak, Zdzislaw Alexander, vi, 594.
Mendelsohn, Nathan Saul, 594, 603.
Merchant, Arif Abdulhussein, 601.
Merging, 79, 175.
Mersenne, Marin, 109, 131, 585.
numbers, 109-110, 151, 278.
primes, 109-110, 127, 507.
Miles, 287, 296.
Mills, Stella, 593.
Mills, William Harold, 594, 603.
Minimum, 65, 237, 363.
Mirsky, Leon, 604.
Mixture of probability distributions, 414.
Mobius, August Ferdinand, 136.
function, 136-139, 357, 448-449, 501.
mod: binary operation, 81-85.
mod: congruence relation, 123-126.
mod 0, 82-83, 500.
Mode, 370, 371, 423.
Modular arithmetic, 123-129.
Modulus, 82.
Moessner, Alfred, 594, 604.
Moments, 384-385.
Montgomery, Peter Lawrence, 594, 603.
Moriarty, James, 162.
Morse, Samuel Finley Breese, code, 288, 310.
Moser, Leo, 594, 602.
Motzkin, Theodor Samuel, 533, 539, 590,
594.
Mountain ranges, 345, 541.
Mozzochi, Charles Jeffrey, 594.
Mu function, 136-139, 357, 448-449, 501.
Multinomial coefficients, 168, 171-172, 240,
545.
Multiple of a number, 102.
Multiple sums, 34-41, 61.
Multiple-precision numbers, 127.
Multiplicative functions, 134-136, 357.
Multisets, 77, 256.
Mumble function, 83, 84, 492, 499.
Mumble-fractional part, 88.
Murdock, Phoebe James, viii.
Murphy’s Law, 74.
Myers, Basil Roland, 594, 604.
Y, see Nu function.
nth difference, 267.
Name and conquer, 2, 32, 88, 139.
National Science Foundation, ix.
Natural logarithm, 53-54, 262.
Naval Research, ix.
Navel research, 285.
Nearest integer, 95.
Necessary and sufficient condition, 72.
Necklaces, 139-141, 245.
Negating the upper index, 164-165.
Negative binomial distribution, 388-389,
414.
Negative factorial powers, 52, 63, 188.
Newman, James Roy, 600.
Newman, Morris, 604.
Newton, Sir Isaac, 189, 263, 594.
series, 189-192.
Newtonian generating function, 364.
Niven, Ivan Morton, 318, 594, 602.
Nontransitive paradox, 396.
Normal distribution, 424.
Notation, x-xi, 2, 21-25, 48-49, 67-70, 73,
81, 102, 111, 115, 123-124, 194, 243.
extension of, 49, 52, 154, 210-211, 252,
257, 297.
ghastly, 67, 175.
need for new, 83, 115, 253.
Nu function, 12, 114, 146, 529.
Null case, 2, 306-307, 335, 541.
Number system, 107, 119.
binomial, 234.
Fibonacci, 282, 296, 303.
prime-exponent, 107, 116.
INDEX 617
radix, 11, 16, 109, 146, 148, 195, 233, 446,
511.
residue, 126-129, 144.
Stern-Brocot, 119-123, 146, 292, 504, 527.
Number theory, 102-152.
o, considered harmful, 434-435.
O-notation, 76, 429-435.
Obvious, clarified, 403, 511.
Odds, 396.
Odlyzko, Andrew Michael, 81, 540, 588, 605.
Office of Naval Research, ix.
One-way equalities, 432-433.
Open interval, 73-74, 96.
Operators, 47, 55, 219.
Optical illusions, 278, 279, 536.
Organ-pipe order, 509.
rc,
26, 70, 146, 232, 471, 540, 570.
n-notation, 64, 106.
Pacioli, Luca, 586.
Palais, Richard Sheldon, viii.
Paradoxes, 279, 396, 515.
Paradoxical sums, 57.
Parallel summation, 159, 174, 208-209.
Parentheses, 343-345.
Parenthesis conventions, xi.
Partial fraction expansions, 64, 189, 284-285,
324-327, 360, 362, 462, 490, 535.
Partial quotients, 292, 304, 540.
Partial sums, 48-49, 55-56, 161, 165-166,
223-230, 233.
required to be positive, 345-348.
Partition into nearly equal parts, 83-85.
Partitions, of the integers, 77-78, 99, 101.
of a number, 316.
of a set, 244-245.
Pascal, Blaise, 155, 156, 594, 602.
Pascal’s triangle, 155.
extended upward, 164.
row products, 231.
row sums, 163, 165.
variant of, 238.
Patashnik, Amy Markowitz, ix.
618
INDEX
Patashnik, Oren, iii, iv, vi, ix, 102, 492, 588,
601.
Patil, Ganapati Parashuram, 594, 605.
Peirce, Charles Santiago Sanders, 510, 595,
603.
sequence, 151.
Penney, Walter Francis,
394.,
595.
Penney ante, 394-396, 416, 423, 424.
Pentagon, 300 (exercise 46), 416, 420.
Pentagonal numbers, 366.
Percus, Jerome Kenneth, 595, 604.
Perfect 66.
powers,
Periodic 20, 179.recurrences,
Permutations, 111-112, 193--196.
ascents in, 253-254, 256.
up-down, 363.
Personal computer, 109.
Perturbation method, 32-33, 43-44, 64, 179,
270-271.
Pfaff, Johann F’riedrich, 207, 217, 595, 603.
reflection law, 217, 235.
pgf: Probability generating
,function.
Phages, 420, 424.
Phi (= Golden ratio), 70, 97, 285-287, 296,
530.
Phi function (= Totient function), 133-135,
137-144, 357, 448-449.
Phidias, 285.
Philosophy, vii, 11, 16, 46, 71, 72, 75, 91,
170, 181, 194, 317, 453, 489, 494, 577.
Phyllotaxis, 277.
Pi, 26, 70, 146, 232, 471, 540, 570.
Pig, Porky, 482.
Pigeonhole principle, 130.
Pincherle, Salvatore, 589.
Pisano, Leonardo, 585, see Fibonacci.
Pittel, Boris Gershon, 552.
Pizza, 4, 409.
Planes, cutting, 19.
Pneumathics, 164.
Pochhammer, Leo, 48, 595.
symbol, 48.
Pocket calculators, 67, 330.
Poincare, Jules Henri, 595, 605.
Poisson,
Sirneon
Denis,
457, 595.
distribution, 414, 554.
summation formula, 576.
Pollak, Henry Otto, 588, 602.
Polya, George (= Gyorgy), vi, 16, 313, 494,
595, 602, 604, 605.
Polygons, 20, 360, 365.
Polynomial argument, 158, 163, 210.
Polynomially recursive sequence, 360.
Polynomials, 189-191.
degree of, 158, 226.
divisibility of, 225.
reflected, 325.
Poonen, Bjorn, 487, 595, 602.
Porter, Thomas K, 601.
Portland cement, see Concrete (in another
book).
Power series, 196, see Generating functions.
formal, 206, 317, 517.
Pr, 367-368.
Pratt, Vaughan Ronald, 601.
Primality testing, 110, 148.
Prime numbers, 23, 105-111, 442.
largest known, 109-110.
Mersenne, 109-110, 127, 507.
size of nth, 110-111, 442-443.
Prime to, 115.
Prime-exponent representation, 107, 116.
Princeton University, ix, 413.
Probabilistic analysis of an algorithm,
399-412.
Probability, 195, 367-424.
conditional, 402-405, 410-411.
discrete, 367-424.
distribution, 367.
generating function, 380-387.
space, 367.
Product of consecutive odd numbers, 186,
256.
Product notation, 64, 106.
Progression, arithmetic, 26, 30, 362.
geometric, 32-33, 54, 114, 205-206.
Proof, 4, 7.
Property, 23, 34.
Pulling out the large part, 439, 444.
Puns, ix, 220.
Pythagoras of Samos, theorem, 495.
Quadratic domain, 147.
Questions, levels of, viii, 72-73, 95, 497.
Quicksort, 28.
Quotation marks, xi.
Quotient, 81.
31:
Real part, 64, 212, 437.
Rabbits, 296.
Radix notation, 11, 16, 109, 146, 148, 195,
233, 446, 511.
Radix-2 representation, 11-13, 15, 70, 113.
Rado, Richard, 595, 604.
Rainville, Earl David, 514, 595.
Ramanujan Aiyangar, Srinivasa, 316.
Ramshaw, Lyle Harold, 73, 601, 603, 605.
Random variables, 369-372.
independent, 370, 413, 423.
Raney, George Neal, 345, 348, 596, 604.
lemma, 345-346.
lemma, generalized, 348, 358.
sequences, 347.
Rao, D. Rameswar, 596, 602.
Rational function, 207, 324.
Rayleigh, John William Strutt, baron, 77,
596.
Real part, 64, 212, 437.
Reciprocity law, 94.
Recorde, Robert, 432, 596.
Recurrences, 1, 3-4, 6, 10, 13, 78-81, 103,
159, 323.
doubly exponential, 97, 100, 101, 109.
implicit, 136-138, 193-194, 270.
periodic, 20, 179.
solving, 323-336.
and sums, 25-29.
unfolding, 6, 100, 159-160, 298.
unfolding asymptotically, 442.
Referee, 175.
INDEX
619
Reference books, 42, 223, 590.
Reflected light rays, 277.
Reflected polynomial, 325.
Reflection law for hypergeometrics, 217, 235.
Regions, 4-5, 17, 19.
Reich, Simeon, 596, 605.
Relative error, 438, 441.
Relatively prime integers, 108, 115-123.
Remainder after division, 81.
Remainder in Euler’s summation formula,
457, 460-461, 465-466.
Renz, Peter Lewis, viii.
Repertoire method, 15, 19, 26, 44-45, 63,
238, 298, 300, 358.
Replicative function, 100.
Residue number system, 126-129, 144.
Retrieving information, 397-399.
Rewards, monetary, ix, 242, 483, 510, 550.
Rham, Georges de, 596, 604.
Ribenboim, Paolo, 532, 596, 603.
Rice, Stephan Oswald, 595.
Rice University, ix.
Riemann, Georg Priedrich Bernhard, 205,
596, 602.
hypothesis, 511.
zeta function, 65, 263-264, 272, 356-357,
449, 511, 542, 547, 569, 571, 575.
Rising factorial powers, 48, 63, 211.
related to falling powers, 63, 298.
related to ordinary powers, 249, 572.
Roberts, Samuel, 596, 602.
Rocky road, 36.
R@dseth,
Bystein Johan, 596, 603.
Rolletschek, Heinrich Franz, 499.
Roots of unity, 149, 204, 361, 530, 550, 572.
modulo m, 128-129.
Rosser, John Barkley, 111, 596.
Rota, Gian-Carlo, 501, 596.
Roulette wheel, 74-75.
Rounding, unbiased, 492.
Roy, Ranjan, 596, 603.
Rubber band, 260-261, 264, 298, 479.
Ruler function, 113, 146, 148.
620
INDEX
Running time, 411-412.
Ruzsa, Imre Zoltan, 584.
0, 374.
t-notation, 22-25.
Saalschiitz, Louis, 596, 603.
identity, 214.
Sample mean and variance, 377-379, 413.
Samplesort, 340.
Sandwiching, 157, 165.
Sarkozy, And&, 526, 596.
Sawyer, Walter Warwick, 207, 597.
Schaffer, Alejandro Alberto, 601.
Schinzel, Andrzej, 510.
Schlomilch, Oscar Xaver, 597.
Schoenfeld, Lowell, 111, 596.
Schonheim, Johanen, 581.
Schroder, Ernst, 597, 604.
Schrodinger, Erwin, 416.
Schroter, Heinrich Eduard, !j97, 604.
Schiitzenberger, Marcel Paul, 605.
Scorer, Richard Segar, 597, 602.
Searching a table, 397-399.
Seaver, George Thomas (= 41), 8, 21, 94,
105, 106, 329.
Second-order Eulerian numbers, 256-257.
Second-order Fibonacci numbers, 361.
Second-order harmonic numbers, 263, 266,
297, 529.
Sedgewick, Robert, 601.
SedlbEek,
JiEi, 597, 604.
Self-certifying algorithms, 104.
Self-describing sequence, 66, 481.
Self reference, 59, 515-524, !j88, 620.
Set inclusion in O-notation, 432.
Shallit, Jeffrey Outlaw, 597, 603.
Sharkansky, Stefan Michael, 601.
Sharp, Robert Thomas, 259, 597.
Sherry, 419.
Shift operator, 55, 188, 191.
Shiloach, Joseph (= Yossi),
,601.
Shor, Peter Williston, 602.
Sicherman, George Leprechaun, 605.
Sideways addition, 12, 114, 146, 238, 529.
Sierpinski, Waclaw, 87, 597, 603.
Sieve of Erastothenes, 111.
Sigma-notation, 22-25.
Signum, 488.
Silverman, David L, 597, 604.
Skepticism, 71.
Skiena, Steven Sol, 526.
Slater, Lucy Joan, 223, 597.
Sloane, Neil James Alexander, 42, 327, 578,
597, 602.
Small cases, 2, 5, 9, 155, 306-307, 316.
Smith, Cedric Austen Bardell, 597, 602.
Snowwalker, Luke, 421.
Solov’ev, Aleksandr Danilovitch, 394, 598.
Solution, 3, 323.
Sorting, 28, 79, 175, 340, 434.
Spanning trees, 334-336, 342, 354-355, 360.
Spec, 77-78, 96, 97, 99, 101.
Special numbers, 243-305.
Spectrum, 77-78, 96, 97, 99, 101, 293, 304.
Spiral function, 99.
Spohn, William Gideon, Jr., 598.
Sports, see Baseball, Football, Frisbees,
Golf, Tennis.
Square pyramidal numbers, 42.
Square root, of 1 (mod m), 128-129.
of 2, 100.
of 3, 364.
Squarefree, 145, 151, 359.
Squares, sum of consecutive, 41-46, 51, 180,
233, 255, 270, 274, 353, 430, 456.
Stack size, 346-347.
Stacking cards, 259-260, 295.
Stallman, Richard Matthew, 598.
Standard deviation, 374, 376-380.
Stanford University, v, vii, ix, 413, 625.
Stanley, Richard Peter, 256, 519, 587, 598,
604, 605.
Staudt, Karl Georg Christian von, 598, 604.
Steele, Guy Lewis, Jr., 598.
Stegun, Irene Anne, 42, 578.
Stein, Sherman Kopald, 602.
Steiner, Jacob, 5, 598, 602.
Steinhaus, Hugo Dyonizy, 605.
Stengel, Charles Dillon (= Casey), 42.
Step functions, 87.
Stern, Moriz Abraham, 116, 598.
Stern-Brocot number system, 119-123, 146,
292, 504, 527.
Stern-Brocot tree, 116-123, 291-292, 364,
510.
Stern-Brocot wreath, 500.
Stewart, Bonnie Madison, 586, 602.
Stickelberger, Ludwig, 598, 602.
Stieltjes, Thomas Jan, 589.
constants, 569, 575.
Stirling, James, 192, 210, 243, 244, 283, 467,
598.
constant, 467, 471-475.
formula, 112, 467-468, 477.
formula, perturbed,
440-441.
numbers, see Stirling numbers.
polynomials, 257-258, 276, 297, 338-339.
triangles, 244, 245, 253.
Stirling numbers, 243-253, 275-276, 478,
577.
combinatorial interpretations, 244-248.
convolution formulas, 258, 276.
of the first kind, 245.
generalized, 257-258, 302, 304, 572.
generating functions for, 337.
identities for, 250-251, 258, 276, 303, 364.
of the second kind, 244.
as sums of products, 545.
Stone, Marshall Harvey, vi.
Straus, Ernst Gabor, 539, 584, 594.
Subfactorial, 194,
238.
Summand, 22.
Summation, 21-66.
asymptotic, 87-89, 452-482.
changing the index of, 30-31, 39.
definite, 49-50.
difficulty measure for, 181.
over divisors, 104-105, 135-137, 141, 356.
factor, 27-29, 64, 261.
INDEX 621
indefinite, 48-49, 55-56, 161, 224-230.
infinite, 56-62, 64.
interchanging the order of, 34-41, 105,
136, 183, 185.
parallel, 159, 174, 208-209.
by parts, 54-56, 63, 265.
over triangular arrays, 36-41.
on the upper index, 160-161, 176.
Sums, 21-66.
absolutely convergent, 60-61, 64.
approximation of, by integral, 45, 262-263,
455-461.
of consecutive cubes, 51, 63, 269, 275, 353.
of consecutive integers, 6, 44, 65.
of consecutive mth powers, 42, 269-271,
274-276, 352-354.
of consecutive squares, 41-46, 51, 180,
255, 270, 274, 353, 430, 456.
divergent, 60, 517.
double, 34-41, 105, 237.
doubly infinite, 59, 98, 468-469.
empty, 23, 48.
floor/ceiling, 86-94.
formal, 307, 317-318.
of harmonic numbers, 41, 56, 265-268,
298-299, 302, 340-341.
hypergeometric, see Hypergeometric
series.
infinite, 56-62, 64.
multiple, 34-41, 61.
notations for, 21-25.
paradoxical, 57.
partial, 48-49, 55-56, 161, 165-166,
223-230, 233.
and recurrences, 25-29.
tail of, 452-455.
Sun
Tsii,
126.
Sunflower, 277.
Super generating functions, 339, 407.
Superfactorial, 149, 231.
Swanson, Ellen Esther, viii.
Sweeney, Dura Warren, 598.
Swinden, B.A., 602.
622
INDEX
Sylvester, James Joseph, 598, 602.
Symmetry identities, 156, 254.
Szegedy,
Mario,
510, 581,
5!39.
Szeg6,
Gabor, 595, 605.
8, see Theta operator.
0,
see Big Theta notation.
Tail inequalities, 414, 416.
Tail of a sum, 452-455.
Tale of a sum, see Squares.
Tangent function, 273, 303.
Tangent numbers, 273.
Tanner, Jonathan William, 131, 599.
Tanny, Stephen Michael, 599, 604.
Tartaglia, Nicolb, triangle, l55.
Taylor, Brook, series, 163, 191, 382, 456-457.
Telescoping, 50.
Tennis, 418-419.
Term, 21.
Term ratio, 207-209,
211-2l2.
T&X,
219, 418, 625.
Thackeray, Henry St. John, 590.
Theisinger, Ludwig, 599, 603.
Theory of numbers, 102-152.
Theory of probability, 367-424.
Theta functions, 469, 509.
Theta operator, 219-221, 296.
Thiele, Thorvald Nicolai, 383, 384, 599.
Thinking, 489.
big, 2, 427, 444, 469, 472.
not at all, 56, 489.
small, see Downward generalization, Small
cases.
Three-dots
(...)
notation, 21, 50, 108.
Titchmarsh, Edward Charles, 599, 605.
Todd, H., 487.
Tong, Christopher Hing, 601.
Totient function, 133-135, 1.37-144, 357,
448-449.
Toto,
556.
Tournament, 418-419.
Tower of Brahma, 1, 4, 264.
Tower of Hanoi, l-4, 26-27, 109, 146.
variations on, 17-19.
Trabb Pardo, Luis Isidoro, 601.
Transitive law, 124.
failure of, 396.
Traps, 154, 157, 183, 222.
Trees, of bees, 277.
binary, 117.
spanning, 334-336, 342, 354-355, 360.
Stern-Brocot, 116-123, 291-292, 364, 510.
Triangular array, summation over, 36-41.
Triangular numbers, 6, 366.
Tricomi, Francesco Giacomo Filippo, 599,
605.
Trigonometric functions, 272-273, 300, 303,
365, 423.
Trinomial coefficients, 168, 171, 476, 546.
Triphages, 420.
Trivial, clarified, 129, 403, 590.
Turdn, Paul, 604.
Typeface, viii-ix, 625.
Uchimura, Keisuke, 579, 604.
Unbiased estimate, 378, 415.
Unbiased rounding, 492.
Uncertainty principle, 467.
Unexpected sum, 167, 215.
Unfolding a recurrence, 6, 100, 159-160, 298.
asymptotically, 442.
Ungar, Peter, 599.
Uniform distribution, 87, 381-382, 404-405.
Uniformity, deviation from, 152; see also
Discrepancy.
Unique factorization, 106-107, 147.
Unit, 147.
Unit fractions, 95, 150.
Unwinding a recurrence, 6.
Up-down permutations, 363.
Upper index, 154.
Upper negation, 164-165.
Upper parameters, 205.
Upper summation, 160-161, 176.
Useless identity, 223.
Uspensky, James Victor, 587, 599, 602.
Vandermonde, Alexandre Theophile, 169,
599, 603.
convolution, 169-170, 187, 198, 201,
211-212, 236.
Vanilla, 36.
Vardi, Ilan, 510, 526, 577, 602, 605.
Variance of a probability distribution,
373-383, 405-411.
Veech,
William Austin, 499.
Venn, John, 484, 599, 602.
diagram, 17, 20.
Venture capitalists, 479-480.
Violin string, 29.
Vocabulary, 75.
Voltaire, de (= Arouet, Fran$ois Marie), 436.
Vyssotsky, Victor Alexander, 526.
Wagstaff,
Samuel Standfield, Jr., 131, 599.
Wall, Charles Robert, 580, 603.
Wallis,
John, 599, 604.
Wapner, Joseph A., 43.
War, 8, 85, 420.
Waring, Edward, 599, 604.
Waterhouse, William Charles, 599.
Watson, John Hamish, 228, 391.
Waugh, Frederick Vail, 599, 604.
Weaver, Warren, 599.
Weisner, Louis, 501, 600.
Wermuth, Edgar Martin Emil, 577.
Weyl, Claus Hugo Hermann, 87, 600.
Wham-O, 421, 429.
Wheel, 74, 360.
big, 75.
of Fortune, 439.
Whidden, Samuel Blackwell, viii.
Whipple, Francis John Welsh, 600, 603.
identity, 241.
Whitehead, Alfred North, 91, 489, 577, 600.
Wilf, Herbert Saul, 81, 500, 600, 603.
Williams, Hugh Cowie, 600, 602.
Wilquin, Denys, 603.
INDEX 623
Wilson, Sir John, theorem, 132, 148, 501,
582.
Wilson, Martha, 148.
Wine, 419.
Witty, Carl Roger, 494.
Wolstenholme, Joseph, 600, 604.
theorem, 531.
Wood, Derick, 600, 602.
Woods, Donald Roy, 598.
Woolf, William Blauvelt, viii.
Worm, and apple, 416.
on rubber band, 260-261, 264, 298, 479.
Worpitzky, Julius Daniel Theodor, 600.
identity, 255.
Wreath, 500.
Wrench, John William, Jr., 574, 580, 605.
Wright, Edward Maitland, 111, 589, 600,
602.
Wythoff (= Wijthoff), W.A., 586.
Yao, Andrew Chi-Chih, ix, 601.
Yao, Foong Frances, ix, 601.
Youngman, Henry (= Henny), 175.
Zag, see Zig.
Zapf, Hermann, viii, 600, 625.
Zave, Derek Alan, 600, 604.
Zeckendorf, Edouard, 600.
theorem, 281, 538.
Zero, not considered harmful, 24-25, 159.
strongly, 24.
Zeta function, 65, 263-264, 272, 356-357,
449, 511, 542, 547, 569, 571, 575.
Zig, 7, 19.
Zig-zag, 19, 485.
Zipf, George Kingsley, law, 405.
O”,
162.
Jz,
100.
&,
364.
w
(if and only if), 68.
-I
(implies), 71.
. . .
(ellipsis), 21, 50, 108, . . . .
List of Tables
Sums and differences
55
Pascal’s triangle
155
Pascal’s triangle extended upward
164
Sums of products of binomial coefficients
169
The top ten binomial coefficient identities
174
General convolution identities 202
Stirling’s triangle for subsets 244
Stirling’s triangle for cycles 245
Basic Stirling number id.entities 250
Additional Stirling number identities 251
Stirling’s triangles in tandem 253
Euler’s triangle
254
Second-order Eulerian triangle 256
Stirling convolution formulas 258
Generating function manipulations 320
Simple sequences and their generating functions
321
Generating functions for special numbers 337
Asymptotic approximations
438
624
THIS BOOK was composed at Stanford University using the ‘QX system
for technical text developed by D. E. Knuth. The mathematics is set in a
new typeface called AMS Euler, designed by Hermann Zapf for the American
Mathematical Society. The text is set in a new typeface called Concrete Ro-
man and Italic, a special version of Knuth’s Computer Modern family with
weights designed to blend with AMS Euler. The paper is 50-lb.-basis Finch
offset, which has a neutral
pH
and a life expectancy of several hundred years.
The offset printing and notch binding were done by Halliday Lithograph Cor-
poration in Hanover, Massachusetts.
625